Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Ratio Level of Measurement00:54

Ratio Level of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. For analysis, data are classified into four levels of measurement—nominal, ordinal, interval, and ratio.
A set of data measured using the ratio scale takes care of the ratio problem and provides complete information. Ratio scale data are like interval scale data, except they have a zero point and ratios can be calculated. For...
Sample Size Calculation01:19

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...
Maxwell-Boltzmann Distribution: Problem Solving01:20

Maxwell-Boltzmann Distribution: Problem Solving

Individual molecules in a gas move in random directions, but a gas containing numerous molecules has a predictable distribution of molecular speeds, which is known as the Maxwell-Boltzmann distribution, f(v).
This distribution function f(v) is defined by saying that the expected number N (v1,v2) of particles with speeds between v1 and v2 is given by
Buffers: Buffer Capacity01:09

Buffers: Buffer Capacity

Buffer capacity is the quantitative measure of a buffer to resist the change in pH. As shown in the following equation, the buffer capacity, denoted by 'beta', is expressed as the number of moles of acid or base needed to change the pH of a one-liter buffer solution by 1 unit. Here, Ca and Cb indicate the number of moles of acid and base, respectively. Note that dpH represents the change in pH.
In the graph, pH is plotted as a function of the number of moles of base (Cb) added to a weak acid...
Maximum Size of Aggregate01:12

Maximum Size of Aggregate

The maximum size of aggregate is defined as the aperture of the sieve retaining 15 percent or more of the particles present in the aggregate sample. The aggregate's maximum size impacts the concrete's water requirement, workability, and strength. Larger aggregates reduce the surface area needing cement paste coverage, which can lower water needs, thereby allowing a decrease in the water-to-cement ratio when the desired workability and richness of the mix are to be maintained, which can result...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
Same author

Accelerating String Comparison in RLZ Compressed Sequences via LCE Jumps.

bioRxiv : the preprint server for biology·2026
Same author

Phase 3 Trial of Oral Infigratinib in Children with Achondroplasia.

The New England journal of medicine·2026
Same author

Nanomechanical Sensor Resolving Impulsive Forces below Its Zero-Point Fluctuations.

Physical review letters·2026
Same author

Movi 2: Fast and Space-Efficient Queries on Pangenomes.

Bioinformatics (Oxford, England)·2026
Same author

Trametinib for multiple non-ossifying fibromas due to KRAS mosaic mutations: two case reports.

Communications medicine·2026
Same journal

Faster Maximal Exact Matches with Lazy LCP Evaluation.

Proceedings. Data Compression Conference·2024
Same journal

Recursive Prefix-Free Parsing for Building Big BWTs.

Proceedings. Data Compression Conference·2024
Same journal

Computing matching statistics on Wheeler DFAs.

Proceedings. Data Compression Conference·2024
Same journal

Augmented Thresholds for MONI.

Proceedings. Data Compression Conference·2024
Same journal

PHONI: Streamed Matching Statistics with Multi-Genome References.

Proceedings. Data Compression Conference·2021
Same journal

Denoising of Quality Scores for Boosted Inference and Reduced Storage.

Proceedings. Data Compression Conference·2017
See all related articles

Related Experiment Video

Updated: May 11, 2026

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication
14:03

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication

Published on: April 20, 2009

25.3K

CSTs for Terabyte-Sized Data.

Marco Oliva1, Davide Cenzato2, Massimiliano Rossi1

  • 1Dept of Comp and Info Sci and Eng, University of Florida, Gainesville, FL.

Proceedings. Data Compression Conference
|May 30, 2024
PubMed
Summary
This summary is machine-generated.

We present RePFP-CST, a scalable method for building compressed suffix trees (CSTs) from large pangenomic datasets. This approach efficiently constructs CSTs directly from VCF files, significantly reducing computational resources.

More Related Videos

Quasi-light Storage for Optical Data Packets
07:45

Quasi-light Storage for Optical Data Packets

Published on: February 6, 2014

10.8K
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.3K

Related Experiment Videos

Last Updated: May 11, 2026

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication
14:03

Micro-drive Array for Chronic in vivo Recording: Drive Fabrication

Published on: April 20, 2009

25.3K
Quasi-light Storage for Optical Data Packets
07:45

Quasi-light Storage for Optical Data Packets

Published on: February 6, 2014

10.8K
Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.3K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Pangenomic datasets are growing, but tools for handling them are limited, especially for non-specialists.
  • Building compressed suffix trees (CSTs) for pangenomic data is computationally challenging.
  • Efficient data structures are crucial for analyzing large-scale genomic information.

Purpose of the Study:

  • To develop a scalable method for constructing compressed suffix trees (CSTs) from pangenomic datasets.
  • To enable non-specialists to build and utilize CSTs for large genomic data.
  • To address the computational challenges associated with pangenomic data analysis.

Main Methods:

  • Introduced RePFP-CST, a novel method for building CSTs directly from Variant Call Format (VCF) files without prior decompression.
  • Implemented pruning strategies on the prefix-free parse (PFP) to reduce dictionary size and parse length.
  • Focused on optimizing time and space efficiency during CST construction.

Main Results:

  • Successfully built a CST for a terabyte of DNA data, a first in the literature.
  • Demonstrated significant reductions in the time and space required for CST construction.
  • Achieved a reduced memory footprint for the final CST, enhancing accessibility.

Conclusions:

  • RePFP-CST offers a scalable and efficient solution for building CSTs from large pangenomic datasets.
  • The method lowers the barrier for non-specialists to work with complex genomic data structures.
  • This advancement facilitates broader application of CSTs in genomic research and analysis.