Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Ratio Level of Measurement

Ratio Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated. For...

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Maxwell-Boltzmann Distribution: Problem Solving

Maxwell-Boltzmann Distribution: Problem Solving

Individual molecules in a gas move in random directions, but a gas containing numerous molecules has a predictable distribution of molecular speeds, which is known as the Maxwell-Boltzmann distribution, f(v).
This distribution function f(v) is defined by saying that the expected number N (v1,v2) of particles with speeds between v1 and v2 is given by

Buffers: Buffer Capacity

Buffers: Buffer Capacity

Buffer capacity is the quantitative measure of a buffer to resist the change in pH. As shown in the following equation, the buffer capacity, denoted by 'beta', is expressed as the number of moles of acid or base needed to change the pH of a one-liter buffer solution by 1 unit. Here, Ca and Cb indicate the number of moles of acid and base, respectively. Note that dpH represents the change in pH.
In the graph, pH is plotted as a function of the number of moles of base (Cb) added to a weak acid...

Maximum Size of Aggregate

Maximum Size of Aggregate

The maximum size of aggregate is defined as the aperture of the sieve retaining 15 percent or more of the particles present in the aggregate sample. The aggregate's maximum size impacts the concrete's water requirement, workability, and strength. Larger aggregates reduce the surface area needing cement paste coverage, which can lower water needs, thereby allowing a decrease in the water-to-cement ratio when the desired workability and richness of the mix are to be maintained, which can result...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

Same author

Accelerating String Comparison in RLZ Compressed Sequences via LCE Jumps.

bioRxiv : the preprint server for biology·2026

Same author

Phase 3 Trial of Oral Infigratinib in Children with Achondroplasia.

The New England journal of medicine·2026

Same author

Nanomechanical Sensor Resolving Impulsive Forces below Its Zero-Point Fluctuations.

Physical review letters·2026

Same author

Movi 2: Fast and Space-Efficient Queries on Pangenomes.

Bioinformatics (Oxford, England)·2026

Same author

Trametinib for multiple non-ossifying fibromas due to KRAS mosaic mutations: two case reports.

Communications medicine·2026

Same journal

Faster Maximal Exact Matches with Lazy LCP Evaluation.

Proceedings. Data Compression Conference·2024

Same journal

Recursive Prefix-Free Parsing for Building Big BWTs.

Proceedings. Data Compression Conference·2024

Same journal

Computing matching statistics on Wheeler DFAs.

Proceedings. Data Compression Conference·2024

Same journal

Augmented Thresholds for MONI.

Proceedings. Data Compression Conference·2024

Same journal

PHONI: Streamed Matching Statistics with Multi-Genome References.

Proceedings. Data Compression Conference·2021

Same journal

Denoising of Quality Scores for Boosted Inference and Reduced Storage.

Proceedings. Data Compression Conference·2017

See all related articles

Search research articles

Related Experiment Video

Updated: May 11, 2026

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication

Published on: April 20, 2009

CSTs for Terabyte-Sized Data.

Marco Oliva¹, Davide Cenzato², Massimiliano Rossi¹

¹Dept of Comp and Info Sci and Eng, University of Florida, Gainesville, FL.

Proceedings. Data Compression Conference

|May 30, 2024

Summary

This summary is machine-generated.

We present RePFP-CST, a scalable method for building compressed suffix trees (CSTs) from large pangenomic datasets. This approach efficiently constructs CSTs directly from VCF files, significantly reducing computational resources.

More Related Videos

Quasi-light Storage for Optical Data Packets

Quasi-light Storage for Optical Data Packets

Published on: February 6, 2014

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Related Experiment Videos

Last Updated: May 11, 2026

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication

Published on: April 20, 2009

Quasi-light Storage for Optical Data Packets

Quasi-light Storage for Optical Data Packets

Published on: February 6, 2014

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Pangenomic datasets are growing, but tools for handling them are limited, especially for non-specialists.
Building compressed suffix trees (CSTs) for pangenomic data is computationally challenging.
Efficient data structures are crucial for analyzing large-scale genomic information.

Purpose of the Study:

To develop a scalable method for constructing compressed suffix trees (CSTs) from pangenomic datasets.
To enable non-specialists to build and utilize CSTs for large genomic data.
To address the computational challenges associated with pangenomic data analysis.

Main Methods:

Introduced RePFP-CST, a novel method for building CSTs directly from Variant Call Format (VCF) files without prior decompression.
Implemented pruning strategies on the prefix-free parse (PFP) to reduce dictionary size and parse length.
Focused on optimizing time and space efficiency during CST construction.

Main Results:

Successfully built a CST for a terabyte of DNA data, a first in the literature.
Demonstrated significant reductions in the time and space required for CST construction.
Achieved a reduced memory footprint for the final CST, enhancing accessibility.

Conclusions:

RePFP-CST offers a scalable and efficient solution for building CSTs from large pangenomic datasets.
The method lowers the barrier for non-specialists to work with complex genomic data structures.
This advancement facilitates broader application of CSTs in genomic research and analysis.