Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Comparing Copy Number Variations and SNPs

Comparing Copy Number Variations and SNPs

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Harmonizing standards and resources for the medical genome.

Nature·2026

Same author

Evolutionary dynamics of Respiratory Syncytial Virus in pre-pandemic, pandemic, and post-pandemic periods in Houston, Texas, USA.

bioRxiv : the preprint server for biology·2026

Same author

Structural variant calling using Sniffles2.

Nature protocols·2026

Same author

A complete human pancreatic cancer genome.

bioRxiv : the preprint server for biology·2026

Same author

Rapid phylogenomic analysis for viral surveillance and metagenomic profiling with Omni2Tree.

bioRxiv : the preprint server for biology·2026

Same author

A computational model for quantifying instability of tandem repeats across the genome.

bioRxiv : the preprint server for biology·2026

Same journal

A pore-facing glycan constrains GABA<sub>A</sub> receptor subunit stoichiometry and gating behavior.

Communications biology·2026

Same journal

Resorantel: a dual-targeting therapeutic with potent efficacy against Staphylococcus aureus with low potential for drug resistance.

Communications biology·2026

Same journal

Rise and subsequent fall in neuro-behavioral coupling during learning a skilled reaching task is revealed by generative AI.

Communications biology·2026

Same journal

Neural effects of expectation violation generalise across sensory modalities.

Communications biology·2026

Same journal

Contraction, recombination and innovation shape the dynamic pan-plastome of Astragalus sinicus.

Communications biology·2026

Same journal

Electric fields trigger ceramide-dependent vesicle budding and boost the generation of small extracellular vesicles.

Communications biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 10, 2025

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

StratoMod: predicting sequencing and variant calling errors with interpretable machine learning.

Nathan Dwarshuis¹, Peter Tonner², Nathan D Olson²

¹Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA. njd2@nist.gov.

Communications Biology

|October 13, 2024

Summary

This summary is machine-generated.

StratoMod predicts germline variant calling errors using machine learning, aiding pipeline design. It identifies challenging genomic regions and missed clinically relevant variants, improving variant calling accuracy.

More Related Videos

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Related Experiment Videos

Last Updated: Jun 10, 2025

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Rare Event Detection Using Error-corrected DNA and RNA Sequencing

Published on: August 3, 2018

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease

Published on: April 4, 2018

Area of Science:

Genomics
Bioinformatics
Machine Learning

Background:

No single variant calling pipeline is optimal for the entire human genome.
Assessing pipeline tradeoffs currently relies on intuition rather than data.
Developers, clinicians, and researchers need better tools for pipeline design.

Purpose of the Study:

To present StratoMod, an interpretable machine-learning classifier to predict germline variant calling errors.
To provide a data-driven method for assessing tradeoffs in variant calling pipelines.
To identify genomic regions and factors contributing to variant calling errors.

Main Methods:

Developed StratoMod, an interpretable machine-learning classifier.
Utilized a draft benchmark based on the Q100 HG002 assembly for difficult regions.
Assessed the impact of mapping strategies (linear vs. graph-based references) on variant calling.
Quantified contributions of difficult-to-map and homopolymer regions to errors.

Main Results:

StratoMod accurately predicts recall for different sequencing platforms (Hifi, Illumina).
Identified specific difficult-to-map regions where graph-based methods show significant improvement.
Quantified the impact of mismapping on predicted recall.
Demonstrated StratoMod's ability to predict missed clinically relevant variants.

Conclusions:

StratoMod offers a data-driven approach to optimize variant calling pipelines.
Its interpretability allows for precise risk-reward analyses in pipeline design.
StratoMod improves upon existing methods by predicting missed variants, not just filtering false positives.