Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

7.3K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
7.3K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

22.3K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
22.3K
Genomics02:02

Genomics

42.0K
Genomics is the science of genomes: it is the study of all the genetic material of an organism. In humans, the genome consists of information carried in 23 pairs of chromosomes in the nucleus, as well as mitochondrial DNA. In genomics, both coding and non-coding DNA is sequenced and analyzed. Genomics allows a better understanding of all living things, their evolution, and their diversity. It has a myriad of uses: for example, to build phylogenetic trees, to improve productivity and...
42.0K
Next-generation Sequencing03:00

Next-generation Sequencing

102.2K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
102.2K
Genomic DNA in Eukaryotes00:58

Genomic DNA in Eukaryotes

54.6K
Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.
54.6K
Sanger Sequencing01:57

Sanger Sequencing

781.1K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
781.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Additive-driven microwave crystallization of tyramine polymorphs and salts: a quantum crystallography perspective. Corrigendum.

IUCrJ·2026
Same author

Reference-free discovery with barcoded single-cell sequencing.

Nature biotechnology·2026
Same author

FunctionaL Assigning Sequence Homing (FLASH) maps phenotype to sequence with deep and machine learning.

bioRxiv : the preprint server for biology·2026
Same author

Fast and accurate multiple-protein-sequence alignment at scale with FAMSA2.

Nature biotechnology·2026
Same author

A Reference-Free Algorithm Discovers Regulation in the Plant Transcriptome.

Plant direct·2026
Same author

MDCompress: better, faster compression of molecular dynamics simulation trajectories.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Apr 19, 2026

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

4.9K

Disk-based compression of data from genome sequencing.

Szymon Grabowski1, Sebastian Deorowicz1, Łukasz Roguski2

  • 1Institute of Applied Computer Science, Lodz University of Technology, Al. Politechniki 11, 90-924 Łódź, Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Polish-Japanese Institute of Information Technology, Koszykowa 86, 02-008 Warszawa, Poland and Centro Nacional de Análisis Genómico (CNAG), 08-028 Barcelona, Spain.

Bioinformatics (Oxford, England)
|December 25, 2014
PubMed
Summary
This summary is machine-generated.

We developed a new algorithm for compressing high-coverage sequencing data. This method significantly reduces storage space by achieving a compression ratio of 0.317 bits per base, making large DNA datasets more manageable.

More Related Videos

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

9.7K
Ultra-long Read Sequencing for Whole Genomic DNA Analysis
10:34

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Published on: March 15, 2019

24.3K

Related Experiment Videos

Last Updated: Apr 19, 2026

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

4.9K
Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

9.7K
Ultra-long Read Sequencing for Whole Genomic DNA Analysis
10:34

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Published on: March 15, 2019

24.3K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • High-coverage sequencing data contains substantial redundancy, posing challenges for efficient compression.
  • Existing FASTQ compressors struggle to capture overlapping read redundancy within limited memory.
  • Disk-based methods, like Burrows-Wheeler transform (BWT), offer improvements but can be further optimized.

Purpose of the Study:

  • To develop a novel compression algorithm specifically for sequencing reads (DNA).
  • To efficiently handle the redundancy present in high-coverage genomic datasets.
  • To improve compression ratios beyond existing state-of-the-art methods.

Main Methods:

  • The proposed method utilizes the concept of minimizers for read compression.
  • The algorithm is designed to be conceptually simple and easily parallelizable.
  • It focuses on capturing redundancy between overlapping sequencing reads.

Main Results:

  • Achieved a compression ratio of 0.317 bits per base.
  • Successfully compressed a 134.0 Gbp human genome dataset into 5.31 GB.
  • Demonstrated superior compression performance compared to previous methods.

Conclusions:

  • The overlapping reads compression with minimizers algorithm offers a significant advancement in DNA data compression.
  • This method enables efficient storage and handling of large-scale genomic datasets.
  • The algorithm's parallelizable nature facilitates its application in large-scale bioinformatics pipelines.