Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Fast probabilistic analysis of sequence function using scoring matrices.

T D Wu¹, C G Nevill-Manning, D L Brutlag

¹Department of Biochemistry, Stanford University School of Medicine, Stanford, CA, USA. twu@gene.com

Bioinformatics (Oxford, England)

|June 27, 2000

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Disrupted Iron Storage in Dental Fluorosis.

Journal of dental research·2019

Same author

Kinetics of intracolloidal iodine in thyroid of iodine-deficient or equilibrated newborn rats. Direct imaging using secondary ion mass spectrometry.

Cellular and molecular biology (Noisy-le-Grand, France)·2008

Same author

Study of the localization of iron, ferritin, and hemosiderin in Alzheimer's disease hippocampus by analytical microscopy at the subcellular level.

Journal of structural biology·2005

Same author

Expression of vascular endothelial growth factor, hypoxia inducible factor 1alpha, and carbonic anhydrase IX in human tumours.

Journal of clinical pathology·2004

Same author

Automated construction of structural motifs for predicting functional sites on protein structures.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2003

Same author

Bioinformatics in the post-genomic era.

Trends in biotechnology·2001

Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026

Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026

Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026

Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026

Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026

See all related articles

We developed new techniques to speed up sequence analysis using scoring matrices by calculating quantile functions and allowing users to set probability (p) thresholds. These methods significantly increase analysis speed for large-scale sequencing projects.

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Sequence analysis relies on scoring matrices, but speed limitations hinder large-scale applications.
Calculating the quantile function for scoring matrices provides a probability (p) value for segmental scores.
User-defined p thresholds enable a balance between sensitivity and speed in sequence analysis.

Purpose of the Study:

To present novel techniques for accelerating sequence analysis using scoring matrices.
To enable wider application of scoring matrices in large-scale sequencing and annotation.
To offer a tunable trade-off between analysis speed and sensitivity.

Main Methods:

Developed three speed-enhancing techniques: probability filtering, lookahead scoring, and permuted lookahead scoring.

Related Experiment Videos

Probability filtering uses a score threshold derived from the p threshold to reduce segments.

Lookahead scoring techniques test intermediate scores and optimize segment scoring order for early termination.

Main Results:

Achieved significant reductions in examined residues, ranging from 62% to 6% based on p threshold.
Demonstrated sequence analysis speeds several times faster than existing programs, reaching 225 residues/s (p=10^-6) and 541 residues/s (p=10^-20).
Evaluated the impact of independence and Markov assumptions on p-value calculations, with Markov assumptions generally increasing p-values.

Conclusions:

The developed techniques substantially increase sequence analysis speed with scoring matrices.
These methods facilitate the broader use of scoring matrices in large-scale bioinformatics.
The EMATRIX software package implements these techniques and is available for academic and commercial use.