Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

An efficient algorithm for identifying matches with errors in multiple long molecular sequences.

M Y Leung¹, B E Blaisdell, C Burge

¹Division of Mathematics, Computer Science and Statistics, University of Texas, San Antonio 78249-0664.

Journal of Molecular Biology

|October 20, 1991

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Occupational asthma in teachers.

Occupational medicine (Oxford, England)·2022

Same author

Three-dimensional evaluation of mandibular asymmetry: a new classification and three-dimensional cephalometric analysis.

International journal of oral and maxillofacial surgery·2018

Same author

Occupational health management system: A study of expatriate construction professionals.

Accident; analysis and prevention·2015

Same author

The relationship between HLA antigens and Bermuda grass hayfever.

Immunogenetics·2011

Same author

Vitamins as asthmagens in the workplace.

The European respiratory journal·2008

Same author

Retained fecalith: laparoscopic removal.

Surgical laparoscopy, endoscopy & percutaneous techniques·2003

Same journal

UPF3A and UPF3B shape the transcriptome cooperatively yet oppose cell function.

Journal of molecular biology·2026

Same journal

Antibody-secreting cells integrate efficient NMD with non‑canonical UPR signaling to maintain proteostasis and support massive immunoglobulin synthesis.

Journal of molecular biology·2026

Same journal

Small molecule stabilization of diverse amyloidogenic immunoglobulin light chains revealed by hydrogen-deuterium exchange mass spectrometry.

Journal of molecular biology·2026

Same journal

UPF1 at Work: Structural and Mechanistic Insights Into a Master Regulator of Nonsense-Mediated mRNA Decay.

Journal of molecular biology·2026

Same journal

Structural basis for the pro-amyloidogenic action and ligand binding of a novel W72R variant of human apolipoprotein A-I.

Journal of molecular biology·2026

Same journal

Cryo-EM Structure of the C. elegans Septin Tetramer Reveals a Revised Architecture and Conserved Positional Orthology.

Journal of molecular biology·2026

See all related articles

This study introduces an efficient algorithm for identifying patterns and repeats in large molecular sequence data, even with errors. The method uses hashing and linked lists, showing near-linear scaling for memory and run time with sequence length.

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Analyzing large molecular sequence datasets is computationally intensive.
Identifying sequence repeats and similarities is crucial for understanding biological function.

Purpose of the Study:

To develop an efficient algorithm for detecting word relations, including matches and repeats, in long molecular sequences.
To accommodate errors within the sequence data during analysis.

Main Methods:

The algorithm employs hashing on fixed-size words.
A linked list structure connects all occurrences of identical words.
The approach is designed for large-scale data analysis.

Main Results:

Related Experiment Videos

The algorithm demonstrates efficiency in finding sequence patterns and repeats.

Average memory and run time scale almost linearly with total sequence length.

Performance was evaluated on an Escherichia coli DNA sequence database.

Conclusions:

The developed algorithm provides an efficient solution for sequence analysis.
The linear scaling makes it suitable for very large genomic datasets.
This method facilitates the study of molecular sequence relationships.