Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Prokaryotic Gene Structure and Organization

Prokaryotic Gene Structure and Organization

Prokaryotic genomes exhibit a streamlined organization of coding and non-coding regions essential for gene expression and protein synthesis. While coding regions contain the genetic instructions for proteins or functional RNAs, non-coding regions regulate the precise transcription and translation of these genes.Coding Regions: Proteins and RNAsThe primary coding regions, known as structural genes, include sequences transcribed into messenger RNA (mRNA) and ultimately translated into...

Genomic DNA in Eukaryotes

Genomic DNA in Eukaryotes

Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.

Organization of Genes

Organization of Genes

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Accelerating String Comparison in RLZ Compressed Sequences via LCE Jumps.

bioRxiv : the preprint server for biology·2026

Same author

RAmpSim: A Thermodynamic Simulator for Hybridization Capture in Metagenomic Sequencing.

bioRxiv : the preprint server for biology·2025

Same author

Toward security-aware portable sequencing.

Nature communications·2025

Same author

Enriched Long-Read Sequencing of Co-circulating Viruses in Complex Samples.

Molecular biology and evolution·2025

Same author

Long-read reconstruction of many diverse haplotypes with devider.

Genome research·2025

Same author

Robust 16S rRNA classification based on a compressed LCA index.

Genome research·2025

Same journal

A unified analysis of cell type- and trajectory-associated pathways in single-cell data using Phoenix.

Genome research·2026

Same journal

Resf1 is required for proper placental development and configuration of trophoblast cell-specific heterochromatin.

Genome research·2026

Same journal

Telomere-driven replicative crisis is driven by large-scale changes in genomic architecture.

Genome research·2026

Same journal

Spatially informed reference-free cell-type deconvolution for spatial transcriptomics with SpatialCD.

Genome research·2026

Same journal

Spatially resolved profiling of steroid nuclear receptors reveals a role for the disordered N-terminal domains in genome targeting and AP-1 interaction.

Genome research·2026

Same journal

Flexible and scalable inference of spatially varying correlation in spatial transcriptomics with spCorr.

Genome research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 17, 2026

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Building genomic data structures from compressed representations using prefix-free parsing.

Rahul Varki¹, Christina Boucher²

¹Department of Computer and Information Science and Engineering, Herbert Wertheim College of Engineering, University of Florida, Gainesville, Florida 32611, USA rvarki@ufl.edu.

Genome Research

|May 15, 2026

Summary

This summary is machine-generated.

Prefix-free parsing (PFP) enables bioinformatics tools to handle massive genome datasets by compressing repetitive text. This allows essential data structures to be built from compressed data, overcoming memory limitations for large-scale pangenomics.

More Related Videos

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Metagenomic Analysis of Silage

Metagenomic Analysis of Silage

Published on: January 13, 2017

Related Experiment Videos

Last Updated: May 17, 2026

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

Metagenomic Analysis of Silage

Metagenomic Analysis of Silage

Published on: January 13, 2017

Area of Science:

Bioinformatics
Genomics
Computational Biology

Background:

High-throughput sequencing enables large pangenomic datasets, exceeding petabyte scale.
Traditional bioinformatics tools struggle with memory limitations on these massive datasets.
A need exists for methods that process data directly from compressed representations.

Purpose of the Study:

To survey prefix-free parsing (PFP) as a solution for handling large-scale genomic data.
To explain the core principles and applications of PFP.
To outline future research directions in PFP for bioinformatics.

Main Methods:

Prefix-free parsing (PFP) as a preprocessing technique.
Compression of repetitive text within large datasets.
Construction of data structures directly from compressed PFP output.

Main Results:

PFP compresses repetitive text efficiently.
Enables the construction of essential data structures from compressed data.
Addresses memory limitations in traditional bioinformatics tools for large datasets.

Conclusions:

PFP is a crucial method for managing and analyzing large-scale pangenomic data.
It overcomes memory constraints by operating on compressed representations.
Further research can expand PFP's applications in bioinformatics.