Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Compression of strings with approximate repeats

L Allison1, T Edgoose, T I Dix

  • 1School of Computer Science and Software Engineering, Monash University, Australia. lloyd,time,trevor@cs.monash.edu.au

Proceedings. International Conference on Intelligent Systems for Molecular Biology
|October 23, 1998
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Bacteriological Survey of Fresh Minced Beef on Sale at Retail Outlets in Scotland in 2019: Three Foodborne Pathogens, Hygiene Process Indicators, and Phenotypic Antimicrobial Resistance.

Journal of food protection·2022
Same author

Detailed hair shaft analysis in a man with delayed-onset Chediak-Higashi syndrome.

The British journal of dermatology·2019
Same author

Outbreak of Escherichia coli O157 Phage Type 32 linked to the consumption of venison products.

Epidemiology and infection·2018
Same author

Inter-laboratory comparison of multi-locus variable-number tandem repeat analysis (MLVA) for verocytotoxin-producing Escherichia coli O157 to facilitate data sharing.

Epidemiology and infection·2014
Same author

VTEC infections and livestock-related exposures in Scotland, 2004.

Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin·2006
Same author

Evaluation of the Escherichia coli threonine deaminase gene as a selectable marker for plant transformation.

Planta·2003
Same journal

Proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology (ISMB 2000). San Diego, California, USA. August 19-23, 2000.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2001
Same journal

Analysis of gene expression data with pathway scores.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000
Same journal

Towards a complete map of the protein space based on a unified sequence and structure analysis of all known proteins.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000
Same journal

Mining for putative regulatory elements in the yeast genome using gene expression data.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000
Same journal

A multiple alignment algorithm for metabolic pathway analysis using enzyme hierarchy.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000
Same journal

Sequence database search using jumping alignments.

Proceedings. International Conference on Intelligent Systems for Molecular Biology·2000
See all related articles

This study introduces a novel string model for approximate substring matching, enhancing DNA sequence analysis. The model efficiently estimates parameters using an expectation-maximization algorithm, improving biological data interpretation.

Area of Science:

  • Computational Biology
  • Bioinformatics
  • Genomics

Background:

  • Traditional string models like Lempel Ziv struggle with approximate matches.
  • Biological sequences, such as DNA, often contain variations and approximate repeats.

Purpose of the Study:

  • To develop a probabilistic model for strings that accommodates approximate substring matches.
  • To enable accurate parameter estimation for biological sequence analysis.
  • To extend the model for analyzing DNA-specific features like reverse complementary repeats.

Main Methods:

  • A novel string model inspired by Lempel Ziv, allowing approximate matches.
  • Expectation-maximization (EM) algorithm for parameter estimation.
  • Development of an O(n^2) algorithm and a faster approximation algorithm.

Related Experiment Videos

Main Results:

  • The model successfully sums probabilities over all explanations, providing robust data probability.
  • EM algorithm efficiently estimates model parameters with few iterations.
  • The extended model effectively analyzes approximate reverse complementary repeats in DNA.

Conclusions:

  • The proposed model offers a powerful framework for analyzing biological sequences with approximate repeats.
  • The EM algorithm provides accurate parameter estimates, crucial for genomic applications.
  • The model's flexibility and extensions enhance its utility in bioinformatics.