Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

5.8K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
5.8K
Maxam-Gilbert Sequencing01:05

Maxam-Gilbert Sequencing

11.2K
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
11.2K
Genome Annotation and Assembly03:36

Genome Annotation and Assembly

18.9K
The genome refers to all of the genetic material in an organism. It can range from a few million base pairs in microbial cells to several billion base pairs in many eukaryotic organisms. Genome assembly refers to the process of taking the DNA sequencing data and putting it all back together in a correct order to create a close representation of the original genome. This is followed by the identification of functional elements on the newly assembled genome, a process called genome annotation.
18.9K
Next-generation Sequencing03:00

Next-generation Sequencing

89.8K
The first human genome sequencing project cost $2.7 billion and was declared complete in 2003, after 15 years of international cooperation and collaboration between several research teams and funding agencies. Today, with the advent of next-generation sequencing technologies, the cost and time of sequencing a human genome have dropped over 100 fold.
Next-Generation Sequencing Methods
Although all next-generation methods use different technologies, they all share a set of standard features....
89.8K
Long-patch Base Excision Repair01:02

Long-patch Base Excision Repair

7.0K
Since the discovery of the two BER pathways, there has been a debate about how a cell chooses one pathway over the other and the factors determining this selection. Numerous in vitro experiments have pointed out multiple determinants for the sub-pathway selection. These are:
7.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Distinct Photochemistry of Odd-Carbon PAHs from the Even-Carbon Ones During the Photoaging and Analysis of Soot.

Environmental science & technology·2024
Same author

Inhibition of neddylation disturbs zygotic genome activation through histone modification change and leads to early development arrest in mouse embryos.

Biochimica et biophysica acta. Molecular basis of disease·2024
Same author

Tunlametinib (HL-085) plus vemurafenib in patients with advanced BRAF V600-mutant solid tumors: an open-label, single-arm, multicenter, phase I study.

Experimental hematology & oncology·2024
Same author

Evolutionary Diversity of Coxsackievirus A6 Causing Severe Hand, Foot, and Mouth Disease - China, 2012-2023.

China CDC weekly·2024
Same author

NmTHC: a hybrid error correction method based on a generative neural machine translation model with transfer learning.

BMC genomics·2024
Same author

Clinical, radiological, and laboratory features of HIV-negative pulmonary cryptococcosis with regard to serum lateral flow assay.

Frontiers in medicine·2024
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Jul 15, 2025

Novel Sequence Discovery by Subtractive Genomics
09:40

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

8.7K

Reference-based genome compression using the longest matched substrings with parallelization consideration.

Zhiwen Lu1, Lu Guo2, Jianhua Chen3

  • 1School of Information, Yunnan University, KunMing, China.

BMC Bioinformatics
|September 30, 2023
PubMed
Summary
This summary is machine-generated.

Efficient genome data compression is crucial due to large sequencing outputs. Our novel algorithm, LMSRGC, uses reference genomes and parallel processing for competitive compression ratios and times.

Keywords:
CUDAGenome compressionParallelizationReference-basedSuffix array

More Related Videos

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

9.6K
Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources
15:28

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Published on: September 3, 2009

20.3K

Related Experiment Videos

Last Updated: Jul 15, 2025

Novel Sequence Discovery by Subtractive Genomics
09:40

Novel Sequence Discovery by Subtractive Genomics

Published on: January 25, 2019

8.7K
A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq
07:09

A Bioinformatics Pipeline for Investigating Molecular Evolution and Gene Expression using RNA-seq

Published on: May 28, 2021

9.6K
Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources
15:28

Primer Extension Capture: Targeted Sequence Retrieval from Heavily Degraded DNA Sources

Published on: September 3, 2009

20.3K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • High-throughput sequencing generates vast amounts of genome data, posing storage and transmission challenges.
  • Efficient genome data compression algorithms are needed to manage this data deluge.
  • Leveraging multi-core computing for parallel processing is key to improving algorithm efficiency.

Purpose of the Study:

  • To develop an efficient genome data compression algorithm.
  • To address the challenge of storing and transmitting large-scale genomic data.
  • To improve the speed of genome data compression using parallel processing.

Main Methods:

  • Proposed a novel algorithm (LMSRGC) utilizing reference genome sequences.
  • Employed suffix array (SA) and longest common prefix (LCP) array to identify longest matched substrings (LMS).
  • Utilized GPUs for parallel SA construction and multi-threading for LCP array creation and LMS filtering.

Main Results:

  • The LMSRGC algorithm effectively compresses genome data in FASTA format.
  • The algorithm leverages SA and LCP array characteristics for optimal compression.
  • Parallelization using GPUs and multi-threading significantly speeds up the compression process.

Conclusions:

  • The developed algorithm demonstrates competitive performance against state-of-the-art methods.
  • LMSRGC achieves favorable compression ratios.
  • The algorithm offers efficient compression times, making it practical for large datasets.