Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Statistics of large-scale sequence searching

R Spang1, M Vingron

  • 1Deutsches Krebsforschungszentrum (DKFZ), Theoretische Bioinformatik, Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany.

Bioinformatics (Oxford, England)
|June 6, 1998
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Gene expression and copy number profiling of follicular lymphoma biopsies from patients treated with first-line rituximab without chemotherapy.

Leukemia & lymphoma·2023
Same author

BITES: balanced individual treatment effect for survival data.

Bioinformatics (Oxford, England)·2022
Same author

SPARC-positive macrophages are the superior prognostic factor in the microenvironment of diffuse large B-cell lymphoma and independent of MYC rearrangement and double-/triple-hit status.

Annals of oncology : official journal of the European Society for Medical Oncology·2021
Same author

Molecular signatures that can be transferred across different omics platforms.

Bioinformatics (Oxford, England)·2017
Same author

Molecular signatures that can be transferred across different omics platforms.

Bioinformatics (Oxford, England)·2017
Same author

Stochastics of Cellular Differentiation Explained by Epigenetics: The Case of T-Cell Differentiation and Functional Plasticity.

Scandinavian journal of immunology·2017
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
Same journal

SpaMFG: a Spatial Multi-omics Integration Method based on Feature Grouping.

Bioinformatics (Oxford, England)·2026
Same journal

CSCN: Inference of Cell-Specific Causal Networks Using Single-Cell RNA-Seq Data.

Bioinformatics (Oxford, England)·2026
Same journal

Sparse CCA-Based Mediation Analysis with High-Dimensional Exposures and Mediators.

Bioinformatics (Oxford, England)·2026
Same journal

Enhancing Cross-Context Generalization in Drug Perturbation Prediction with a Multimodal Conditional Diffusion Framework.

Bioinformatics (Oxford, England)·2026
Same journal

Primer Design through Submodular Function Estimation.

Bioinformatics (Oxford, England)·2026
See all related articles

Statistical significance of database search scores is improved by accounting for database properties. A new semi-random model with an "effective database size" parameter corrects discrepancies in p-value computations for sequence similarity searches.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Standard sequence alignment tools like BLAST and FASTA rely on statistical significance of similarity scores.
  • Accurate p-value computation is challenging in database searches due to multiple comparisons and database characteristics.
  • Existing models often assume purely random data, failing to capture real-world database complexities.

Purpose of the Study:

  • To address the limitations of current statistical models for sequence database searches.
  • To improve the accuracy of p-value calculations for similarity scores in large biological databases.
  • To introduce a more realistic statistical framework for evaluating search results.

Main Methods:

  • Extensive simulations of database searches were performed on the SWISS-PROT protein database (Release 31.0).

Related Experiment Videos

  • A novel semi-random statistical model was developed to better represent real databases.
  • The model incorporates an "effective database size" parameter to account for database-specific statistical properties.
  • Main Results:

    • A discrepancy was observed between theoretical predictions and empirical distributions of similarity scores.
    • The proposed semi-random model demonstrated improved accuracy in p-value computation compared to purely random models.
    • The "effective database size" parameter effectively captures database-specific statistical properties.

    Conclusions:

    • The developed semi-random model provides a more accurate assessment of statistical significance for database search results.
    • Accounting for database properties like sequence length distribution and repeated patterns is crucial for reliable p-value estimation.
    • This approach enhances the credibility of findings from large-scale sequence similarity searches.