KeBaB: k-mer based breaking for finding long MEMs
View abstract on PubMed
Summary
This summary is machine-generated.KeBaB, a novel k-mer filtration method using Bloom filters, accelerates the search for long maximal exact matches (MEMs) in genomics. This technique enhances sequence alignment and metagenomic classification by efficiently filtering input data.
Area Of Science
- Bioinformatics
- Computational Biology
- Genomics
Background
- Maximal exact matches (MEMs) are crucial for genomics tasks like read classification and sequence alignment.
- Existing tools like ropebwt3 efficiently find MEMs by skipping redundant matching steps.
- Further optimization is needed to accelerate these computationally intensive processes.
Purpose Of The Study
- To introduce KeBaB, a k-mer filtration method designed to enhance the speed and efficiency of MEM-finding algorithms.
- To reduce the computational load on tools such as ropebwt3 by pre-filtering input data.
- To improve the performance of metagenomic classification without compromising accuracy.
Main Methods
- Development of KeBaB, a k-mer filtration step utilizing a Bloom filter.
- Integration of KeBaB with existing MEM-finders like ropebwt3.
- Breaking down input sequences into 'pseudo-MEMs' to guarantee the containment of all long MEMs.
Main Results
- KeBaB significantly accelerates MEM-finding algorithms by enabling them to ignore larger portions of input data.
- Experimental results demonstrate KeBaB's ability to speed up metagenomic classification.
- The method achieves acceleration without a significant reduction in classification accuracy.
Conclusions
- KeBaB offers a fast and space-efficient solution for k-mer filtration in genomics.
- The proposed method effectively enhances the performance of MEM-finding tools and downstream applications like metagenomic classification.
- KeBaB provides a flexible approach, allowing for either complete MEM identification or targeted identification within the longest pseudo-MEMs.
Related Concept Videos
The double-stranded structure of DNA has two major advantages. First, it serves as a safe repository of genetic information where one strand serves as the back-up in case the other strand is damaged. Second, the double-helical structure can be wrapped around proteins called histones to form nucleosomes, which can then be tightly wound to form chromosomes. This way, DNA chains up to 2 inches long can be contained within microscopic structures in a cell. A double-stranded break not only damages...
Since the discovery of the two BER pathways, there has been a debate about how a cell chooses one pathway over the other and the factors determining this selection. Numerous in vitro experiments have pointed out multiple determinants for the sub-pathway selection. These are:
Lesion type: Depending on the type of base damage, a specific DNA glycosylase - mono or bifunctional, is recruited to the damaged site. While the sequential action of a monofunctional glycosylase favors long patch repair...
The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...
In the same year as the discovery of the Sanger sequencing method, another group of scientists, Allan Maxam and Walter Gilbert, demonstrated their chemical-cleavage method for DNA sequencing. The Maxam-Gilbert method relies on using different chemicals that can cleave the DNA sequence at specific sites, the separation of resulting DNA fragments of variable size using electrophoresis, and deciphering the DNA sequence from the resulting gel bands.
Challenges of the Maxam-Gilbert Method
The...
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

