Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

A framework for space-efficient read clustering in metagenomic samples.

Jarno Alanko1, Fabio Cunial2, Djamal Belazzougui3

  • 1Department of Computer Science, University of Helsinki, Gustaf Hällströmin katu 2b, Helsinki, 00560, Finland. jarno.alanko@cs.helsinki.fi.

BMC Bioinformatics
|April 1, 2017
PubMed
Summary

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Population-scale Long-read Sequencing in the <i>All of Us</i> Research Program.

medRxiv : the preprint server for health sciences·2025
Same author

Blended Length Genome Sequencing (blend-seq): Combining Short Reads with Low-Coverage Long Reads to Maximize Variant Discovery.

bioRxiv : the preprint server for biology·2025
Same author

Exploiting uniqueness: seed-chain-extend alignment on elastic founder graphs.

Bioinformatics (Oxford, England)·2025
Same author

K-mer analysis of long-read alignment pileups for structural variant genotyping.

Nature communications·2025
Same author

K-mer analysis of long-read alignment pileups for structural variant genotyping.

bioRxiv : the preprint server for biology·2024
Same author

Finding maximal exact matches in graphs.

Algorithms for molecular biology : AMB·2024
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles
This summary is machine-generated.

New algorithms for unsupervised metagenomic clustering offer significant space efficiency. This breakthrough enables practical analysis of large metagenomic samples without reference genomes, advancing microbial community studies.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Metagenomic samples comprise DNA fragments from diverse, often uncharacterized species.
  • Unsupervised metagenomic clustering aims to group these fragments into taxonomic units without reference genomes.
  • The increasing size of metagenomic data necessitates space-efficient clustering algorithms.

Purpose of the Study:

  • To develop a space-efficient algorithmic framework for unsupervised metagenomic clustering.
  • To address the need for scalable analysis of large and growing metagenomic datasets.

Main Methods:

  • Utilized a bidirectional Burrows-Wheeler index and a union-find data structure.
  • Designed algorithms for core primitives in metagenomic clustering.
  • Analyzed algorithmic complexity in terms of sample length, number of reads, read length, alphabet size, and redundancy.
Keywords:
Burrows-Wheeler transformMetagenomicsRead clusteringRight-maximal k-merSubmaximal repeatSuffix-link treeText indexing

Related Experiment Videos

Main Results:

  • Achieved O(n(t+logσ)) time complexity for clustering.
  • Required only 2n+o(n)+O(max{ℓ σlogn,K logm}) bits of additional space.
  • Demonstrated practical performance and multi-core capability through parallel suffix-link tree traversal.

Conclusions:

  • The developed algorithms are practical and efficient for metagenomic data analysis.
  • The framework is competitive in both space and time complexity with existing state-of-the-art methods.
  • Enables effective taxonomic unit approximation from complex environmental DNA samples.