Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genome Annotation and Assembly03:36

Genome Annotation and Assembly

20.5K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
20.5K
Sanger Sequencing01:57

Sanger Sequencing

773.1K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
773.1K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

6.8K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
6.8K
Genomics02:02

Genomics

39.6K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
39.6K
Genome Size and the Evolution of New Genes03:21

Genome Size and the Evolution of New Genes

3.3K
3.3K
Next-generation Sequencing03:00

Next-generation Sequencing

97.7K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
97.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Prefix-free parsing for merging big BWTs.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2025
Same author

CSTs for Terabyte-Sized Data.

Proceedings. Data Compression Conference·2024
Same author

A survey of BWT variants for string collections.

Bioinformatics (Oxford, England)·2024
Same author

r-indexing the eBWT.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2024
Same author

Computing the original eBWT faster, simpler, and with less memory.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2024
Same author

Suffix sorting via matching statistics.

Algorithms for molecular biology : AMB·2024
Same journal

An interpretable framework for cancer drug response prediction using integrated drug and multi-omics data with a hybrid Bi-LSTM-GRU network.

Computational biology and chemistry·2026
Same journal

SegMWB: A lightweight deep learning framework for microscopic image classification.

Computational biology and chemistry·2026
Same journal

Protein dynamic simulations: From early inception to clinical translation over half a century.

Computational biology and chemistry·2026
Same journal

Integrated omics and virtual screening predict Tabularin as a dual inhibitor of the prognostic microRNAs mir-19a and mir-32 in colorectal cancer.

Computational biology and chemistry·2026
Same journal

In silico characterization of acetyl-CoA carboxylase from Staphylococcus aureus and Escherichia coli: A comparative analysis.

Computational biology and chemistry·2026
Same journal

An optimized cascaded transformer with progressive attention for lung and colon cancer diagnosis from histopathological images.

Computational biology and chemistry·2026
See all related articles

Related Experiment Video

Updated: Jan 13, 2026

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes
09:10

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

Published on: May 22, 2018

9.9K

Measuring genomic data with prefix-free parsing.

Simone Lucà1, Francesco Masillo2, Zsuzsanna Lipták1

  • 1University of Verona, Department of Computer Science, Strada le Grazie, 15, Verona, 37134, Italy.

Computational Biology and Chemistry
|January 7, 2026
PubMed
Summary
This summary is machine-generated.

Prefix-free parsing (PFP) provides an efficient measure (π) for biological text repetitiveness and pangenome openness. This novel approach is significantly faster and more space-efficient than existing methods.

Keywords:
CompressionMassive dataPangenome opennessPrefix-free parsingRepetitiveness measuresText indexing

More Related Videos

Metagenomic Analysis of Silage
08:43

Metagenomic Analysis of Silage

Published on: January 13, 2017

19.0K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.7K

Related Experiment Videos

Last Updated: Jan 13, 2026

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes
09:10

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

Published on: May 22, 2018

9.9K
Metagenomic Analysis of Silage
08:43

Metagenomic Analysis of Silage

Published on: January 13, 2017

19.0K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.7K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Data Science

Background:

  • Prefix-free parsing (PFP) is a heuristic for indexing large biological datasets.
  • The PFP data structure, comprising a dictionary and a parse, accelerates index computation.

Purpose of the Study:

  • To investigate the size of the prefix-free parse (π) as a standalone analytical tool.
  • To evaluate π as a measure of text repetitiveness and pangenome openness.
  • To compare the efficiency of π with existing measures.

Main Methods:

  • The study analyzes the size of the prefix-free parse (π).
  • π is applied as a repetitiveness measure and compared with z, r, and δ.
  • π is utilized as a measure for pangenome openness.
  • A systematic study of PFP parameters (window size w, modulus p) was conducted.

Main Results:

  • π demonstrates effectiveness as a repetitiveness measure and for assessing pangenome openness.
  • Results are comparable to existing measures but achieved with significantly greater efficiency (time and space).
  • The study identified optimal parameter choices for PFP, revealing open research questions.

Conclusions:

  • The size of the prefix-free parse (π) is a powerful and efficient tool for analyzing biological data.
  • π offers a more efficient alternative to existing repetitiveness and pangenome openness measures.
  • Further research into PFP parameter optimization is warranted.