Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

emMAW: computing minimal absent words in external memory.

Alice Héliou1, Solon P Pissis2, Simon J Puglisi3

  • 1LIX, École Polytechnique, CNRS, INRIA, Université Paris-Saclay, Palaiseau, France.

Bioinformatics (Oxford, England)
|April 14, 2017
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Finimizers: Variable-Length Bounded-Frequency Minimizers for $k$-mer Sets.

IEEE transactions on computational biology and bioinformatics·2025
Same author

Missing value replacement in strings and applications.

Data mining and knowledge discovery·2025
Same author

Pangenome comparison via ED strings.

Frontiers in bioinformatics·2024
Same author

Suffix sorting via matching statistics.

Algorithms for molecular biology : AMB·2024
Same author

Seedability: optimizing alignment parameters for sensitive sequence comparison.

Bioinformatics advances·2023
Same author

Themisto: a scalable colored k-mer index for sensitive pseudoalignment against hundreds of thousands of bacterial genomes.

Bioinformatics (Oxford, England)·2023
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

We developed emMAW, an external-memory algorithm for finding minimal absent words in large genomes. This tool efficiently processes massive datasets, overcoming previous RAM limitations for genomic analysis.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Minimal absent words hold biological significance across life's domains.
  • Existing algorithms for minimal absent words require substantial RAM, limiting their application to large datasets.
  • Previous methods struggle with the computational demands of analyzing extensive genomic sequences.

Purpose of the Study:

  • To introduce the first external-memory algorithm for computing minimal absent words.
  • To enable the analysis of significantly larger biological datasets than previously feasible.
  • To provide an efficient and accessible tool for researchers studying genomic sequences.

Main Methods:

  • Developed emMAW, an innovative external-memory algorithm.
  • Created a free, open-source implementation of the emMAW algorithm.

Related Experiment Videos

  • Utilized suffix arrays for efficient sequence analysis.
  • Main Results:

    • The emMAW algorithm successfully computes minimal absent words on large datasets, including the full human genome.
    • The implementation requires minimal RAM (1 GB) and processes the human genome in under 3 hours on a standard workstation.
    • Despite using external memory, the algorithm maintains competitive speed, performing comparably to internal-memory methods on smaller datasets.

    Conclusions:

    • emMAW significantly advances the computational feasibility of analyzing minimal absent words in large-scale genomic data.
    • The open-source availability of emMAW democratizes access to powerful genomic analysis tools.
    • This work overcomes previous memory constraints, paving the way for deeper insights into genomic structures and functions.