Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Genome Size and the Evolution of New Genes

Genome Size and the Evolution of New Genes

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prefix-free parsing for merging big BWTs.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2025

Same author

CSTs for Terabyte-Sized Data.

Proceedings. Data Compression Conference·2024

Same author

A survey of BWT variants for string collections.

Bioinformatics (Oxford, England)·2024

Same author

r-indexing the eBWT.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2024

Same author

Computing the original eBWT faster, simpler, and with less memory.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2024

Same author

Suffix sorting via matching statistics.

Algorithms for molecular biology : AMB·2024

Same journal

An interpretable framework for cancer drug response prediction using integrated drug and multi-omics data with a hybrid Bi-LSTM-GRU network.

Computational biology and chemistry·2026

Same journal

SegMWB: A lightweight deep learning framework for microscopic image classification.

Computational biology and chemistry·2026

Same journal

Protein dynamic simulations: From early inception to clinical translation over half a century.

Computational biology and chemistry·2026

Same journal

Integrated omics and virtual screening predict Tabularin as a dual inhibitor of the prognostic microRNAs mir-19a and mir-32 in colorectal cancer.

Computational biology and chemistry·2026

Same journal

In silico characterization of acetyl-CoA carboxylase from Staphylococcus aureus and Escherichia coli: A comparative analysis.

Computational biology and chemistry·2026

Same journal

An optimized cascaded transformer with progressive attention for lung and colon cancer diagnosis from histopathological images.

Computational biology and chemistry·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 13, 2026

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

Published on: May 22, 2018

Measuring genomic data with prefix-free parsing.

Simone Lucà¹, Francesco Masillo², Zsuzsanna Lipták¹

¹University of Verona, Department of Computer Science, Strada le Grazie, 15, Verona, 37134, Italy.

Computational Biology and Chemistry

|January 7, 2026

Summary

This summary is machine-generated.

Prefix-free parsing (PFP) provides an efficient measure (π) for biological text repetitiveness and pangenome openness. This novel approach is significantly faster and more space-efficient than existing methods.

Keywords:

Compression Massive data Pangenome openness Prefix-free parsing Repetitiveness measures Text indexing

More Related Videos

Metagenomic Analysis of Silage

Metagenomic Analysis of Silage

Published on: January 13, 2017

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Related Experiment Videos

Last Updated: Jan 13, 2026

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

A Fast and Quantitative Method for Post-translational Modification and Variant Enabled Mapping of Peptides to Genomes

Published on: May 22, 2018

Metagenomic Analysis of Silage

Metagenomic Analysis of Silage

Published on: January 13, 2017

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Area of Science:

Bioinformatics
Computational Biology
Data Science

Background:

Prefix-free parsing (PFP) is a heuristic for indexing large biological datasets.
The PFP data structure, comprising a dictionary and a parse, accelerates index computation.

Purpose of the Study:

To investigate the size of the prefix-free parse (π) as a standalone analytical tool.
To evaluate π as a measure of text repetitiveness and pangenome openness.
To compare the efficiency of π with existing measures.

Main Methods:

The study analyzes the size of the prefix-free parse (π).
π is applied as a repetitiveness measure and compared with z, r, and δ.
π is utilized as a measure for pangenome openness.
A systematic study of PFP parameters (window size w, modulus p) was conducted.

Main Results:

π demonstrates effectiveness as a repetitiveness measure and for assessing pangenome openness.
Results are comparable to existing measures but achieved with significantly greater efficiency (time and space).
The study identified optimal parameter choices for PFP, revealing open research questions.

Conclusions:

The size of the prefix-free parse (π) is a powerful and efficient tool for analyzing biological data.
π offers a more efficient alternative to existing repetitiveness and pangenome openness measures.
Further research into PFP parameter optimization is warranted.