Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Genome Annotation and Assembly

Genome Annotation and Assembly

The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.

Genomics

Genomics

Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

Genomic DNA in Eukaryotes

Genomic DNA in Eukaryotes

Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Additive-driven microwave crystallization of tyramine polymorphs and salts: a quantum crystallography perspective. Corrigendum.

IUCrJ·2026

Same author

Reference-free discovery with barcoded single-cell sequencing.

Nature biotechnology·2026

Same author

FunctionaL Assigning Sequence Homing (FLASH) maps phenotype to sequence with deep and machine learning.

bioRxiv : the preprint server for biology·2026

Same author

Fast and accurate multiple-protein-sequence alignment at scale with FAMSA2.

Nature biotechnology·2026

Same author

A Reference-Free Algorithm Discovers Regulation in the Plant Transcriptome.

Plant direct·2026

Same author

MDCompress: better, faster compression of molecular dynamics simulation trajectories.

Bioinformatics (Oxford, England)·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 19, 2026

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

Disk-based compression of data from genome sequencing.

Szymon Grabowski¹, Sebastian Deorowicz¹, Łukasz Roguski²

¹Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008 Warszawa, Poland and Centro Nacional de Análisis Genómico (CNAG), 08-028 Barcelona, Spain.

Bioinformatics (Oxford, England)

|December 25, 2014

Summary

This summary is machine-generated.

We developed a new algorithm for compressing high-coverage sequencing data. This method significantly reduces storage space by achieving a compression ratio of 0.317 bits per base, making large DNA datasets more manageable.

More Related Videos

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Published on: March 15, 2019

Related Experiment Videos

Last Updated: Apr 19, 2026

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Published on: March 15, 2019

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

High-coverage sequencing data contains substantial redundancy, posing challenges for efficient compression.
Existing FASTQ compressors struggle to capture overlapping read redundancy within limited memory.
Disk-based methods, like Burrows-Wheeler transform (BWT), offer improvements but can be further optimized.

Purpose of the Study:

To develop a novel compression algorithm specifically for sequencing reads (DNA).
To efficiently handle the redundancy present in high-coverage genomic datasets.
To improve compression ratios beyond existing state-of-the-art methods.

Main Methods:

The proposed method utilizes the concept of minimizers for read compression.
The algorithm is designed to be conceptually simple and easily parallelizable.
It focuses on capturing redundancy between overlapping sequencing reads.

Main Results:

Achieved a compression ratio of 0.317 bits per base.
Successfully compressed a 134.0 Gbp human genome dataset into 5.31 GB.
Demonstrated superior compression performance compared to previous methods.

Conclusions:

The overlapping reads compression with minimizers algorithm offers a significant advancement in DNA data compression.
This method enables efficient storage and handling of large-scale genomic datasets.
The algorithm's parallelizable nature facilitates its application in large-scale bioinformatics pipelines.