Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Next-generation Sequencing03:00

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features.
RNA-seq03:21

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while microarray-based...
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
Sanger Sequencing01:57

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved DNA...
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

The Critical Period Microbiota Shape Brain Plasticity.

bioRxiv : the preprint server for biology·2026
Same author

Revised Adaptive Immune Receptor Data in the Immune Epitope Database.

bioRxiv : the preprint server for biology·2026
Same author

Evaluating the Intelligence of large language models: A comparative study using verbal and visual IQ tests.

Computers in human behavior. Artificial humans·2026
Same author

The nBAF complex subunit CREST/SS18L1 regulates hippocampal memory processes via tyrosine 397 and histone acetyltransferase CBP.

Cell reports·2026
Same author

Increased VH4+JH6+ antibody heavy chain use in plasmablasts from asymptomatic multiple sclerosis patients.

Genes and immunity·2026
Same author

Circadian reprogramming by timed sodium intake reveals transcriptional pathways of daily salt handling in the colon.

Science advances·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Jun 8, 2026

Ultra-long Read Sequencing for Whole Genomic DNA Analysis
10:34

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Published on: March 15, 2019

Data structures and compression algorithms for high-throughput sequencing technologies.

Kenny Daily1, Paul Rigor, Scott Christley

  • 1Department of Computer Science, University of California Irvine, Irvine, CA 92697 USA.

BMC Bioinformatics
|October 16, 2010
PubMed
Summary
This summary is machine-generated.

High-throughput sequencing (HTS) data compression is crucial for bioinformatics. Our novel algorithms compress HTS data by over 10x, outperforming general tools and enabling efficient data management.

More Related Videos

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Cost-Efficient Transcriptomic-Based Drug Screening
06:40

Cost-Efficient Transcriptomic-Based Drug Screening

Published on: February 23, 2024

Related Experiment Videos

Last Updated: Jun 8, 2026

Ultra-long Read Sequencing for Whole Genomic DNA Analysis
10:34

Ultra-long Read Sequencing for Whole Genomic DNA Analysis

Published on: March 15, 2019

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease
09:34

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Cost-Efficient Transcriptomic-Based Drug Screening
06:40

Cost-Efficient Transcriptomic-Based Drug Screening

Published on: February 23, 2024

Area of Science:

  • Bioinformatics
  • Genomics
  • Computational Biology

Background:

  • High-throughput sequencing (HTS) generates vast amounts of data, posing significant bioinformatics challenges for storage and sharing.
  • The increasing scale of HTS experiments necessitates efficient data management solutions.

Purpose of the Study:

  • To develop advanced data structures and compression algorithms specifically for high-throughput sequencing data.
  • To address the growing need for efficient storage and sharing of large-scale genomic datasets.

Main Methods:

  • Developed novel data structures and entropy coding algorithms (e.g., Golomb, Elias Gamma, Huffman) for HTS data.
  • Implemented a processing stage to map short sequences to reference genomes or sequence tables.
  • Compressed sequence addresses, lengths, and substitutions using various entropy coding techniques.

Main Results:

  • Achieved compression ratios of 10x or more for HTS data, varying with dataset properties.
  • Developed algorithms demonstrated superior performance compared to general-purpose compression tools like gzip, bzip2, and 7zip.
  • Our compression algorithms were consistently faster than the best general-purpose compression programs.

Conclusions:

  • No single encoding strategy is optimal for all HTS data; effectiveness depends on data distribution.
  • The proposed methodology and compression techniques, implemented in GenCompress, facilitate timely management and sharing of HTS data.
  • These advanced compression techniques are vital for researchers as sequence databases continue to grow exponentially.