Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Calibrating E-values for hidden Markov models using reverse-sequence null models.

Kevin Karplus¹, Rachel Karchin, George Shackelford

¹Department of Biomolecular Engineering, University of California, Santa Cruz, 95064, USA. karplus@soe.ucsc.edu

Bioinformatics (Oxford, England)

|August 27, 2005

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

pVACtools v6: A comprehensive suite for neoantigen prediction, visualization, and therapy design.

ArXiv·2026

Same author

Deep Learning Enabled 3D Multi-Omic Analysis Reveals Molecular Signatures of Heterogeneous Response to Chemotherapy in Pancreatic Cancer.

bioRxiv : the preprint server for biology·2026

Same author

Author Correction: The genomic landscape of response to EGFR blockade in colorectal cancer.

Nature·2026

Same author

Reconstructing clone-resolved transcriptional programs from bulk tumor sequencing.

bioRxiv : the preprint server for biology·2026

Same author

Do Pseudosequences Matter in Neoantigen Prediction?

bioRxiv : the preprint server for biology·2025

Same author

ImmunoNX: a robust bioinformatics workflow to support personalized neoantigen vaccine trials.

ArXiv·2025

Same journal

Biomedical Concept Recognition with Error-aware Negative-enhanced Ranking Framework.

Bioinformatics (Oxford, England)·2026

Same journal

TEDLH: Domain HMMs for sensitive detection of remote homologues.

Bioinformatics (Oxford, England)·2026

Same journal

PLNFGL: Joint Estimation of Multi-Condition Gene Networks from Single-cell RNA-seq Data.

Bioinformatics (Oxford, England)·2026

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

See all related articles

Hidden Markov models (HMMs) use reverse-sequence null models to reduce false positives. A new theoretical distribution improves significance estimation for HMM scores, enhancing database search accuracy.

Area of Science:

Bioinformatics
Computational Biology
Statistical Modeling

Background:

Hidden Markov models (HMMs) are used to calculate sequence generation probabilities.
Log-odds scoring provides context by comparing probabilities against a null hypothesis.
Reverse-sequence null models reduce biases from sequence length and composition, decreasing false positives in database searches.

Purpose of the Study:

To address the challenge of accurately computing significance for reverse-sequence null model scores, which do not fit the standard Gumbel distribution.
To derive and evaluate new theoretical distributions for improved significance estimation of HMM scores.

Main Methods:

Derived a theoretical distribution for HMM scores based on the Gumbel distribution.
Developed parameter estimation methods using maximum likelihood and moment matching (least-squares fit for Student's t-distribution).

Related Experiment Videos

Evaluated distribution fits using hold-out data and assessed the impact on HMM-based fold-recognition methods.

Main Results:

The derived theoretical distribution showed improved tail fitting and reduced false positives compared to standard methods.
An ad hoc distribution with a stretched exponential tail performed even better.
Moment-matching methods provided better tail fits than maximum-likelihood methods for distribution parameter estimation.

Conclusions:

A novel theoretical distribution improves the accuracy of significance estimates for HMM scores derived from reverse-sequence null models.
This improvement leads to more reliable results in HMM-based sequence analysis and fold-recognition tasks.
The study highlights the importance of appropriate statistical distributions for accurate interpretation of HMM search results.