Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Clustering-Based Compression for Population DNA Sequences.

Kin-On Cheng, Ngai-Fong Law, Wan-Chi Siu

    IEEE/ACM Transactions on Computational Biology and Bioinformatics
    |October 14, 2017
    PubMed
    Summary
    This summary is machine-generated.

    Related Concept Videos

    You might also read

    Related Articles

    Articles linked to this work by shared authors, journal, and citation graph.

    Sort by
    Same author

    Recent Advances in Deep Learning-Based Source Camera Identification and Device Linking.

    Sensors (Basel, Switzerland)·2025
    Same author

    AnlightenDiff: Anchoring Diffusion Probabilistic Model on Low Light Image Enhancement.

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2024
    Same author

    See360: Novel Panoramic View Interpolation.

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2022
    Same author

    Features Guided Face Super-Resolution via Hybrid Model of Deep Learning and Random Forests.

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2021
    Same author

    Online-Learning-Based Bayesian Decision Rule for Fast Intra Mode and CU Partitioning Algorithm in HEVC Screen Content Coding.

    IEEE transactions on image processing : a publication of the IEEE Signal Processing Society·2019
    Same author

    Compression of Multiple DNA Sequences Using Intra-Sequence and Inter-Sequence Similarities.

    IEEE/ACM transactions on computational biology and bioinformatics·2015
    Same journal

    circ2DGNN: circRNA-Disease Association Prediction via Transformer-Based Graph Neural Network.

    IEEE/ACM transactions on computational biology and bioinformatics·2024
    Same journal

    Hierarchical Hypergraph Learning in Association- Weighted Heterogeneous Network for miRNA- Disease Association Identification.

    IEEE/ACM transactions on computational biology and bioinformatics·2024
    Same journal

    Discriminative Domain Adaption Network for Simultaneously Removing Batch Effects and Annotating Cell Types in Single-Cell RNA-Seq.

    IEEE/ACM transactions on computational biology and bioinformatics·2024
    Same journal

    MLW-BFECF: A Multi-Weighted Dynamic Cascade Forest Based on Bilinear Feature Extraction for Predicting the Stage of Kidney Renal Clear Cell Carcinoma on Multi-Modal Gene Data.

    IEEE/ACM transactions on computational biology and bioinformatics·2024
    Same journal

    An End-to-End Knowledge Graph Fused Graph Neural Network for Accurate Protein-Protein Interactions Prediction.

    IEEE/ACM transactions on computational biology and bioinformatics·2024
    Same journal

    Generative Biomedical Event Extraction With Constrained Decoding Strategy.

    IEEE/ACM transactions on computational biology and bioinformatics·2024
    See all related articles

    A new algorithm, Reference-based Compression using Clustering (RCC), significantly enhances genome data compression by up to 91%. It groups similar DNA sequences for more efficient storage, though processing time is currently higher.

    Area of Science:

    • Bioinformatics
    • Computational Biology
    • Genomics

    Background:

    • Exponential growth in sequenced individual genomes necessitates advanced data compression techniques.
    • Existing compression methods struggle with the scale of genomic data.

    Purpose of the Study:

    • To introduce a novel algorithm, Reference-based Compression using Clustering (RCC), for highly effective genome sequence compression.
    • To leverage population sequence substructures for improved compression ratios.

    Main Methods:

    • Utilized k-means clustering to partition population sequences into distinct clusters.
    • Developed a reference sequence for each cluster to enable compression by referencing.
    • Applied hierarchical referencing for compressing cluster reference sequences.

    Related Experiment Videos

    Main Results:

    • Achieved significant compression ratios, reducing compressed size by up to 91.0% compared to state-of-the-art methods.
    • Demonstrated the effectiveness of clustering and reference-based compression for genomic data.

    Conclusions:

    • RCC offers a substantial improvement in genome data compression efficiency.
    • A trade-off exists between compression size and processing time; further optimization is needed for speed.