Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

RNA-seq

RNA-seq

RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases.
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...

Maxam-Gilbert Sequencing

Maxam-Gilbert Sequencing

In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...

Sanger Sequencing

Sanger Sequencing

DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...

Next-generation Sequencing

Next-generation Sequencing

The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....

Long-patch Base Excision Repair

Long-patch Base Excision Repair

Since the discovery of the two BER pathways, there has been a debate about how a cell chooses one pathway over the other and the factors determining this selection. Numerous in vitro experiments have pointed out multiple determinants for the sub-pathway selection. These are:

RACE - Rapid Amplification of cDNA Ends

RACE - Rapid Amplification of cDNA Ends

Rapid Amplification of cDNA Ends, or RACE, is one of the most effective methods to obtain a full-length cDNA from an mRNA sequence between a known internal region to the unknown sequence at the 5’ or 3’ end. The unknown region is cloned in the cDNA by a gene-specific primer that binds the known end, and a hybrid primer that attaches a predefined anchor sequence to the unknown end of the cDNA. The sequence in between is amplified by PCR with an anchor primer and a gene-specific...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Kaminari: a frugal colored index for approximate <i>k</i>-mer queries.

Bioinformatics advances·2026

Same author

Highly Constrained Kinetic Models for Single-Cell Gene Expression Analysis.

bioRxiv : the preprint server for biology·2026

Same author

Data-driven AI system for learning how to run transcript assemblers.

Genome biology·2026

Same author

CodonMoE: DNA language models for codon-dependent mRNA prediction.

Bioinformatics (Oxford, England)·2026

Same author

QCatch: a framework for quality control assessment and analysis of single-cell sequencing data.

Bioinformatics (Oxford, England)·2026

Same author

<i>k</i> ache-hash: A dynamic, concurrent, and cache-efficient hash table for streaming <i>k</i> -mer operations.

bioRxiv : the preprint server for biology·2026

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 17, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Reference-based compression of short-read sequences using path encoding.

Carl Kingsford¹, Rob Patro¹

¹Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA and Department of Computer Science, Stony Brook University, Stony Brook, NY 11794-4400, USA.

Bioinformatics (Oxford, England)

|February 5, 2015

Summary

This summary is machine-generated.

New path encoding methods significantly reduce the computational burden of next-generation sequencing data storage. This novel approach offers superior compression for RNA-seq reads compared to existing techniques.

More Related Videos

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Published on: September 3, 2009

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

Published on: February 18, 2022

Related Experiment Videos

Last Updated: Apr 17, 2026

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Published on: September 3, 2009

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

De novo Identification of Actively Translated Open Reading Frames with Ribosome Profiling Data

Published on: February 18, 2022

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Next-generation sequencing (NGS) generates massive datasets, posing significant computational challenges for storage, transmission, and archiving.
Current compression techniques are insufficient for the scale of data produced by modern sequencing technologies.

Purpose of the Study:

To develop a novel compression method for short-read sequence data that alleviates the computational burden of managing large-scale sequencing data.
To bridge the gap between reference-based and reference-free compression methods, combining their respective advantages.

Main Methods:

Introduced 'path encoding,' a novel compression approach connecting de Bruijn graph paths with context-dependent arithmetic coding.
Developed an efficient system for compactly storing sets of k-mers, a component supporting the path encoding method.
Implemented the method in Go, providing freely available source code and binaries for Linux and Mac OS X.

Main Results:

Achieved significant data reduction for RNA-seq reads, utilizing only 3-11% of the space of raw FASTA files.
Demonstrated an average compression improvement of over 34% compared to competing compression approaches.
Showed that effective compression can be achieved even with a poorly matched reference genome.

Conclusions:

Path encoding offers a flexible and highly effective solution for compressing next-generation sequencing data.
The developed method substantially reduces storage requirements, addressing a critical bottleneck in genomic data management.
The approach provides a valuable tool for researchers dealing with large-scale sequencing datasets.