Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Genome Annotation and Assembly03:36

Genome Annotation and Assembly

21.5K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
21.5K
Next-generation Sequencing03:00

Next-generation Sequencing

100.8K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
100.8K
Sanger Sequencing01:57

Sanger Sequencing

778.1K
DNA sequencing is a fundamental technique that is routinely used in the biological sciences. This method can be applied to a range of questions at different scales - from the sequencing of a cloned DNA fragment or the study of a mutation in a gene up to whole-genome sequencing. However, despite the widespread use of sequencing today, it was not until 1977 that Fredrick Sanger and his collaborators developed the chain-termination method to decode DNA sequences. It relies on the separation of a...
778.1K
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

7.2K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
7.2K
RNA-seq03:21

RNA-seq

12.4K
RNA sequencing, or RNA-Seq, is a high-throughput sequencing technology used to study the transcriptome of a cell. Transcriptomics helps to interpret the functional elements of a genome and identify the molecular constituents of an organism. Additionally, it also helps in understanding the development of an organism and the occurrence of diseases. 
Before the discovery of RNA-seq, microarray-based methods and Sanger sequencing were used for transcriptome analysis. However, while...
12.4K
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

13.6K
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
13.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Differential quantification of alternative splicing events on spliced pangenome graphs.

PLoS computational biology·2024
Same author

PangeBlocks: customized construction of pangenome graphs via maximal blocks.

BMC bioinformatics·2024
Same author

Diverse somatic Transformer and sex chromosome karyotype pathways regulate gene expression in Drosophila gonad development.

bioRxiv : the preprint server for biology·2024
Same author

Data Structures for SMEM-Finding in the PBWT.

International Symposium on String Processing and Information Retrieval : SPIRE ... : proceedings. SPIRE (Symposium)·2024
Same author

RecGraph: recombination-aware alignment of sequences to variation graphs.

Bioinformatics (Oxford, England)·2024
Same author

μ- PBWT: a lightweight r-indexing of the PBWT for storing and querying UK Biobank data.

Bioinformatics (Oxford, England)·2023
Same journal

GMSA: A Graph Matching and Point Cloud Registration-Based Method for Spatial Transcriptomics Data Alignment.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Investigations on Multiple Protein Scaffold Filling.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Cell Type Prediction for Single-Cell RNA Sequencing Utilizing Unsupervised Domain Adaptation and Semi-Supervised Learning.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

PPIGAN: Prediction of Protein-Protein Interactions Using Generative Adversarial Networks.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Deep Structure-Enhanced Cell Clustering Model for Single-Cell RNA Sequencing Data.

Journal of computational biology : a journal of computational molecular cell biology·2026
Same journal

Asymmetric Drug-Drug Interaction Prediction Based on Generative Adversarial Networks and Knowledge Graph.

Journal of computational biology : a journal of computational molecular cell biology·2026
See all related articles

Related Experiment Video

Updated: Mar 24, 2026

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

4.7K

LSG: An External-Memory Tool to Compute String Graphs for Next-Generation Sequencing Data Assembly.

Paola Bonizzoni1, Gianluca Della Vedova1, Yuri Pirola1

  • 1Dipartimento di Informatica Sistemistica e Comunicazione, Università degli Studi di Milano-Bicocca , Milan, Italy .

Journal of Computational Biology : a Journal of Computational Molecular Cell Biology
|March 9, 2016
PubMed
Summary
This summary is machine-generated.

Disk-based approaches for next-generation sequencing (NGS) data indexing are crucial. The light string graph (LSG) algorithm significantly reduces memory usage for genome assembly, making large-scale analysis more accessible.

Keywords:
Burrows-Wheeler transformexternal-memory algorithmsstring graphs

More Related Videos

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

9.7K
Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies
12:08

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

6.0K

Related Experiment Videos

Last Updated: Mar 24, 2026

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance
04:58

Author Spotlight: Investigating the Role of Repetitive DNA Misregulation in Cancer Initiation and Immunotherapy Resistance

Published on: December 13, 2024

4.7K
Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms
10:41

Leveraging CyVerse Resources for De Novo Comparative Transcriptomics of Underserved Non-model Organisms

Published on: May 9, 2017

9.7K
Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies
12:08

Hybrid De Novo Genome Assembly for the Generation of Complete Genomes of Urinary Bacteria using Short- and Long-read Sequencing Technologies

Published on: August 20, 2021

6.0K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Next-generation sequencing (NGS) generates vast amounts of short-read data, necessitating efficient data handling for applications like metagenomics and cancer genomics.
  • Indexing and assembling this data present significant computational challenges, particularly concerning memory usage and disk-based approaches.

Purpose of the Study:

  • To develop a space-efficient, disk-based algorithm for computing string graphs using the Burrows-Wheeler Transform (BWT).
  • To create an external memory algorithm for de novo genome assembly from NGS data.

Main Methods:

  • Development of the light string graph (LSG) algorithm, a disk-based approach utilizing a novel FM-index representation.
  • Integration of LSG into a genome assembly pipeline with SGA (state-of-the-art assembler) and BEETL (for indexing).

Main Results:

  • LSG successfully built a string graph for a 875-million read dataset using only 1GB of main memory, a 50-fold reduction compared to SGA.
  • The LSG algorithm required slightly more than twice the execution time of SGA.
  • The integrated pipeline demonstrated substantial memory reduction with only a moderate increase in running time.

Conclusions:

  • LSG offers a highly memory-efficient solution for constructing string graphs from large NGS datasets.
  • The LSG-integrated pipeline provides a practical approach for large-scale genome assembly with significantly reduced memory footprint.