Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

RNA-seq03:21

RNA-seq

12.7K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
12.7K
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

13.9K
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
13.9K
Sanger Sequencing01:57

Sanger Sequencing

780.9K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
780.9K
Next-generation Sequencing03:00

Next-generation Sequencing

102.2K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
102.2K
Long-patch Base Excision Repair01:02

Long-patch Base Excision Repair

8.4K
Since the discovery of the two BER pathways, there has been a debate about how a cell chooses one pathway over the other and the factors determining this selection. Numerous in vitro experiments have pointed out multiple determinants for the sub-pathway selection. These are:
8.4K
RACE - Rapid Amplification of cDNA Ends02:35

RACE - Rapid Amplification of cDNA Ends

7.7K
Rapid Amplification of cDNA Ends, or RACE, is one of the most effective methods to obtain a full-length cDNA from an mRNA sequence between a known internal region to the unknown sequence at the 5’ or 3’ end. The unknown region is cloned in the cDNA by a gene-specific primer that binds the known end, and a hybrid primer that attaches a predefined anchor sequence to the unknown end of the cDNA. The sequence in between is amplified by PCR with an anchor primer and a gene-specific...
7.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Kaminari: a frugal colored index for approximate <i>k</i>-mer queries.

Bioinformatics advances·2026
Same author

Highly Constrained Kinetic Models for Single-Cell Gene Expression Analysis.

bioRxiv : the preprint server for biology·2026
Same author

Data-driven AI system for learning how to run transcript assemblers.

Genome biology·2026
Same author

CodonMoE: DNA language models for codon-dependent mRNA prediction.

Bioinformatics (Oxford, England)·2026
Same author

QCatch: a framework for quality control assessment and analysis of single-cell sequencing data.

Bioinformatics (Oxford, England)·2026
Same author

<i>k</i> ache-hash: A dynamic, concurrent, and cache-efficient hash table for streaming <i>k</i> -mer operations.

bioRxiv : the preprint server for biology·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Apr 17, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.9K

Reference-based compression of short-read sequences using path encoding.

Carl Kingsford1, Rob Patro1

  • 1Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794-4400, USA.

Bioinformatics (Oxford, England)
|February 5, 2015
PubMed
Summary
This summary is machine-generated.

New path encoding methods significantly reduce the computational burden of next-generation sequencing data storage. This novel approach offers superior compression for RNA-seq reads compared to existing techniques.

More Related Videos

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources
15:28

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Published on: September 3, 2009

20.9K
De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data
08:23

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

Published on: February 18, 2022

4.4K

Related Experiment Videos

Last Updated: Apr 17, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER
14:06

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

15.9K
Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources
15:28

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Published on: September 3, 2009

20.9K
De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data
08:23

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

Published on: February 18, 2022

4.4K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Next-generation sequencing (NGS) generates massive datasets, posing significant computational challenges for storage, transmission, and archiving.
  • Current compression techniques are insufficient for the scale of data produced by modern sequencing technologies.

Purpose of the Study:

  • To develop a novel compression method for short-read sequence data that alleviates the computational burden of managing large-scale sequencing data.
  • To bridge the gap between reference-based and reference-free compression methods, combining their respective advantages.

Main Methods:

  • Introduced 'path encoding,' a novel compression approach connecting de Bruijn graph paths with context-dependent arithmetic coding.
  • Developed an efficient system for compactly storing sets of k-mers, a component supporting the path encoding method.
  • Implemented the method in Go, providing freely available source code and binaries for Linux and Mac OS X.

Main Results:

  • Achieved significant data reduction for RNA-seq reads, utilizing only 3-11% of the space of raw FASTA files.
  • Demonstrated an average compression improvement of over 34% compared to competing compression approaches.
  • Showed that effective compression can be achieved even with a poorly matched reference genome.

Conclusions:

  • Path encoding offers a flexible and highly effective solution for compressing next-generation sequencing data.
  • The developed method substantially reduces storage requirements, addressing a critical bottleneck in genomic data management.
  • The approach provides a valuable tool for researchers dealing with large-scale sequencing datasets.