Mumemto: efficient maximal matching across pangenomes
View abstract on PubMed
Summary
This summary is machine-generated.Mumemto efficiently computes multi-sequence maximal unique matches (multi-MUMs) for large pangenome analysis. This tool aids in genome alignment, reveals structural variations, and highlights conserved regions across numerous assemblies.
Area Of Science
- Genomics
- Bioinformatics
- Computational Biology
Background
- Pangenome analysis requires aligning genomes to common coordinates, a computationally intensive process.
- Multi-sequence maximal unique matches (multi-MUMs) serve as crucial anchors for core genome alignments.
- Existing methods face challenges in scalability for large-scale pangenome construction and analysis.
Purpose Of The Study
- To introduce Mumemto, a novel computational tool for efficient multi-MUM computation in large pangenomes.
- To enable visualization of genomic synteny and identification of structural variations within pangenomes.
- To facilitate the analysis of pangenome conservation and aberrant genome assemblies.
Main Methods
- Development of Mumemto using C++ and Python.
- Computation of multi-MUMs and other match types across extensive genome datasets.
- Benchmarking Mumemto's performance on large human and fungal pangenome datasets.
Main Results
- Mumemto successfully computed multi-MUMs across 320 human genome assemblies (960GB) in 25.7 hours using under 800 GB of memory.
- The tool processed hundreds of fungal genome assemblies in minutes, demonstrating high scalability.
- Mumemto enabled visualization of synteny, identification of assembly errors, and highlighted structural variations and conservation.
Conclusions
- Mumemto provides an efficient and scalable solution for multi-MUM computation in large pangenomes.
- The tool aids in understanding genome structure, variation, and conservation across diverse species.
- Mumemto is an open-source resource that can advance pangenome analysis and comparative genomics.
Related Concept Videos
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
The genomes of eukaryotes are punctuated by long stretches of sequence which do not code for proteins or RNAs. Although some of these regions do contain crucial regulatory sequences, the vast majority of this DNA serves no known function. Typically, these regions of the genome are the ones in which the fastest change, in evolutionary terms, is observed, because there is typically little to no selection pressure acting on these regions to preserve their sequences.
In contrast, regions which code...
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
The present-day mitochondrial and chloroplast genomes have retained some of the characteristics of their ancestral prokaryotes and also have acquired new attributes during their evolution within eukaryotic cells. Like prokaryotic genomes, mitochondrial and chloroplast genomes neither bind with histone-like proteins nor show complex packaging into chromosome-like structures, as observed in eukaryotes. Unlike mitotic cell divisions observed in eukaryotic cells, mitochondria and chloroplasts...
Speciation describes the formation of one or more new species from one or sometimes multiple original species. The resulting species are discrete from the parent species, and barriers to reproduction will typically exist. There are two primary mechanisms, speciation with and without geographic isolation—allopatric and sympatric speciation, respectively.
Allopatric Speciation
In allopatric speciation, gene flow between two populations of the same species is prevented by a geographic...
Eukaryotes have large genomes compared to prokaryotes. To fit their genomes into a cell, eukaryotic DNA is packaged extraordinarily tightly inside the nucleus. To achieve this, DNA is tightly wound around proteins called histones, which are packaged into nucleosomes that are joined by linker DNA and coil into chromatin fibers. Additional fibrous proteins further compact the chromatin, which is recognizable as chromosomes during certain phases of cell division.
The Human Genome Measured in...

