Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Wilcoxon Signed-Ranks Test for Matched Pairs

Wilcoxon Signed-Ranks Test for Matched Pairs

The Wilcoxon signed-rank test for matched pairs evaluates the null hypothesis by combining the ranks of differences with their signs. It essentially tests whether the median of the differences in a population of matched pairs is zero. Since the test incorporates more information than the sign test, it generally yields more trustable conclusions. This test also does not require the data to follow a normal distribution, but two conditions must be met for it to be applicable: (1) the data must...

Wald-Wolfowitz Runs Test II

Wald-Wolfowitz Runs Test II

The Wald-Wolfowitz runs test, commonly referred to as the runs test, is a nonparametric test used to assess the randomness of ordered data. The test evaluates the number of runs, which are consecutive sequences of similar elements within the data. If the number of runs is significantly higher or lower than expected, the data is considered non-random, indicating a detectable pattern or structure.
For binary data, runs are identified using symbols such as + and −, or equivalently, 1s and 0s. In...

Long-patch Base Excision Repair

Long-patch Base Excision Repair

Since the discovery of the two BER pathways, there has been a debate about how a cell chooses one pathway over the other and the factors determining this selection. Numerous in vitro experiments have pointed out multiple determinants for the sub-pathway selection. These are:

Kendall's Tau Test

Kendall's Tau Test

Kendall's tau test, also known as the Kendall rank coefficient test, is a nonparametric method for assessing association between two variables. This test is particularly useful for identifying significant correlations when the distributions of the sample and population are unknown. Developed in 1938 by the British statistician Sir Maurice George Kendall, the tau coefficient (denoted as τ) serves as a rank correlation coefficient, with values ranging from -1 to +1.
A τ value of +1 indicates that...

Compacting Factor test

Compacting Factor test

The compacting factor test is a method used to assess the workability of concrete. It is especially suitable for concrete mixes containing aggregates up to one and a half inches in size. This test involves specialized equipment consisting of two truncated cone-shaped hoppers and a cylinder, all with polished interior surfaces to minimize friction.
The procedure begins by placing concrete into the upper hopper without any compaction. Once filled, the bottom door of this hopper is opened,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Vision transformer autoencoders captures local and non-local features in brain imaging to reveal novel genetic associations.

Communications biology·2026

Same author

Replicability of unsupervised deep learning derived image phenotypes.

bioRxiv : the preprint server for biology·2026

Same author

Genetic architecture of white matter microstructure captured by unsupervised deep representation learning of fractional anisotropy maps.

Nature communications·2026

Same author

Improving Vancomycin Therapeutic Drug Monitoring With a Deep Learning-Based Two-Compartment Predictive Model: Development and Validation Study.

JMIR AI·2026

Same author

HiFiMAP: High-resolution fast identity-by-descent mapping test.

medRxiv : the preprint server for health sciences·2026

Same author

Haplotype-based Parallel PBWT for Biobank Scale Data.

IEEE ... International Conference on Computational Advances in Bio and Medical Sciences : [proceedings]. IEEE International Conference on Computational Advances in Bio and Medical Sciences·2026

Same journal

Haplotype Threading Using the Positional Burrows-Wheeler Transform.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2026

Same journal

Estimation of substitution and indel rates via <i>k</i>-mer statistics.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2026

Same journal

A k-mer-Based Estimator of the Substitution Rate Between Repetitive Sequences.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2026

Same journal

Acceleration of FM-index Queries Through Prefix-free Parsing.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2025

Same journal

PLA-index: A <i>k</i>-mer Index Exploiting Rank Curve Linearity.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2025

Same journal

Applying the Safe-And-Complete Framework to Practical Genome Assembly.

Algorithms in bioinformatics : ... International Workshop, WABI ..., proceedings. WABI (Workshop)·2025

See all related articles

Search research articles

Related Experiment Video

Updated: May 9, 2026

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

An Efficient Data Structure and Algorithm for Long-Match Query in Run-Length Compressed BWT.

Ahsan Sanaullah¹, Degui Zhi², Shaojie Zhang¹

¹Department of Computer Science, University of Central Florida, Orlando, FL, USA.

Algorithms in Bioinformatics : ... International Workshop, WABI ..., Proceedings. WABI (Workshop)

|May 8, 2026

Summary

This summary is machine-generated.

This study introduces locally maximal exact matches (LEMs) to find informative substring matches missed by traditional methods. An efficient algorithm finds long LEMs in large genomic datasets using compressed string indexes.

Keywords:

BWT LEM Long LEM MEM Move Data Structure Pangenome Run Length Compressed BWT Theory of computation → Data compression Theory of computation → Pattern matching

More Related Videos

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Related Experiment Videos

Last Updated: May 9, 2026

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Protein WISDOM: A Workbench for In silico De novo Design of BioMolecules

Published on: July 25, 2013

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Creating and Applying a Reference to Facilitate the Discussion and Classification of Proteins in a Diverse Group

Published on: August 16, 2017

Area of Science:

Bioinformatics
Computational Biology
Stringology

Background:

Traditional bioinformatics string matching focuses on exact substring matches, often using maximum exact matches (MEMs).
MEMs may miss informative, long-enough matches that are not maximal within the query.
Large, repetitive biological sequence datasets like genomes require efficient indexing and querying methods.

Purpose of the Study:

To introduce and efficiently compute locally maximal exact matches (LEMs) of a specified length threshold (long LEMs).
To capture informative substring matches missed by MEMs, particularly in large, repetitive biological datasets.
To develop an algorithm for finding long LEMs using compressed string indexes.

Main Methods:

Developed an algorithm for finding long LEMs in BWT runs compressed string indexes.
Utilized an O(r) words space string index, adapting the move data structure.
Achieved O(m + occ) expected time complexity for outputting all long LEMs.

Main Results:

An efficient algorithm for computing long LEMs with O(m + occ) expected time was developed.
The algorithm leverages a compressed string index supporting constant-time PLCP queries.
The approach effectively identifies significant matches in large, repetitive datasets.

Conclusions:

Long LEMs provide valuable similarity information often overlooked by MEMs.
This method is particularly useful for analyzing pangenome and large-scale haplotype panel data.
Efficient computation of long LEMs is crucial for modern large-scale bioinformatics.