Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Compressing DNA sequence databases with coil.

W Timothy J White1, Michael D Hendy

  • 1Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand. w.t.white@massey.ac.nz

BMC Bioinformatics
|May 21, 2008
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

David Penny (1938-2024).

Nature ecology & evolution·2024
Same author

Population-scale detection of non-reference sequence variants using colored de Bruijn graphs.

Bioinformatics (Oxford, England)·2021
Same author

From cheek swabs to consensus sequences: an A to Z protocol for high-throughput DNA sequencing of complete human mitochondrial genomes.

BMC genomics·2014
Same author

The statistical-mechanics of chromosome conformation capture.

Nucleus (Austin, Tex.)·2013
Same author

Beyond reasonable doubt: evolution from DNA sequences.

PloS one·2013
Same author

Mutational dynamics of aroid chloroplast genomes.

Genome biology and evolution·2012
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

New software called coil offers improved compression for DNA sequence databases. This method achieves better compression ratios than standard tools, especially for Expressed Sequence Tag (EST) data, aiding storage and distribution.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomic Data Management

Background:

  • Public DNA sequence databases are growing exponentially, posing significant storage and data communication challenges.
  • Current compression methods (e.g., gzip) applied to large sequence databases yield suboptimal compression ratios.
  • Limited research has focused on compressing entire DNA sequence databases, as opposed to individual sequences.

Purpose of the Study:

  • To introduce and evaluate a novel software package, coil, for efficient compression and decompression of DNA sequence databases.
  • To assess the performance of coil compared to existing general-purpose compression tools.
  • To explore coil's capability in handling incremental database updates.

Main Methods:

  • Development of a portable software package named coil.

Related Experiment Videos

  • Implementation of edit-tree coding as the core compression strategy.
  • Testing coil on a large GenBank database file containing Expressed Sequence Tag (EST) data.
  • Main Results:

    • coil achieves a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for EST data.
    • The software demonstrates efficient encoding of incremental additions to sequence databases.
    • While compression is computationally intensive, decompression is fast and requires minimal memory.

    Conclusions:

    • coil provides a superior alternative for compressing and distributing DNA sequence databases with a narrow distribution of sequence lengths, such as EST data.
    • Future work may focus on enhancing compression levels for databases with a wide distribution of sequence lengths.