Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Incomplete Dominance

Incomplete Dominance

Gregor Mendel's work (1822 - 1884) was primarily focused on pea plants. Through his initial experiments, he determined that every gene in a diploid cell has two variants called alleles inherited from each parent. He suggested that amongst these two alleles, one allele is dominant in character and the other recessive. The combination of alleles determines the phenotype of a gene in an organism.

Multiple Allele Traits

Multiple Allele Traits

The Concept of Multiple Allelism

Evolutionary Relationships through Genome Comparisons

Evolutionary Relationships through Genome Comparisons

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...

Polygenic Traits

Polygenic Traits

When more than one gene is responsible for a given phenotype, the trait is considered polygenic. Human height is a polygenic trait. Studies have uncovered hundreds of loci that influence height, and there are believed to be many more. Due to the high number of genes involved, as well as environmental and nutritional factors, height varies significantly within a given population. The distribution of height forms a bell-shaped curve, with relatively few individuals in the population at the...

Genome-wide Association Studies-GWAS

Genome-wide Association Studies-GWAS

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

Genetic Variation

Genetic Variation

Genetic variation is the diversity in DNA sequences found among individuals of the same species. This diversity is crucial for a species' survival because it helps organisms adapt to environmental changes. Genetic variation begins with fertilization, where an egg and sperm cell merge. Each of these cells carries 23 chromosomes, up to 46 in the fertilized egg. Chromosomes are long DNA strands that contain genes, the basic units of heredity.
Genes exist in different versions called alleles,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A rapid review of genetic association studies of parent-of-origin effects and fetal growth.

Molecular and cellular pediatrics·2026

Same author

Identification of multi-omic pleiotropy factors for peripheral artery disease.

medRxiv : the preprint server for health sciences·2025

Same author

Symptoms of the suicide crisis syndrome and associated risk factors in an acute psychiatric population, a cross-sectional study.

European psychiatry : the journal of the Association of European Psychiatrists·2025

Same author

Sex-specific cardiovascular disease risk prediction using statistical learning and explainable artificial intelligence: the HUNT Study.

European journal of preventive cardiology·2025

Same author

Exploring associations between the FTO rs9939609 genotype and plasma concentrations of appetite-related hormones in adults with obesity.

PloS one·2025

Same author

From Movements to Metrics: Evaluating Explainable AI Methods in Skeleton-Based Human Activity Recognition.

Sensors (Basel, Switzerland)·2024

Same journal

Another 10 years of PLOS Computational Biology: A data-driven reflection on trends in genomics research.

PLoS computational biology·2026

Same journal

Mobility data resolution needed to inform predictive models of spatial epidemic spread from mobile phone data.

PLoS computational biology·2026

Same journal

DeepMethylation: A deep learning framework for tissue-specific DNA methylation prediction and functional variant annotation.

PLoS computational biology·2026

Same journal

Redefining and estimating the early-phase reproduction ratio for epidemic outbreaks in spatially structured populations.

PLoS computational biology·2026

Same journal

Optimized phenotype definitions boost GWAS power.

PLoS computational biology·2026

Same journal

Detection, communication, and individual identification with deep audio embeddings: A case study with North Atlantic right whales.

PLoS computational biology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 6, 2025

Infinium Assay for Large-scale SNP Genotyping Applications

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

Inferring feature importance with uncertainties with application to large genotype data.

Pål Vegard Johnsen^1,2, Inga Strümke^3,4, Mette Langaas²

¹SINTEF DIGITAL, Oslo, Norway.

Plos Computational Biology

|March 14, 2023

Summary

This summary is machine-generated.

We introduce Sub-SAGE, a Shapley-value-based method for estimating feature importance and its uncertainty. This approach efficiently identifies key predictors in data-generating processes, particularly for tree-based models.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

Related Experiment Videos

Last Updated: Aug 6, 2025

Infinium Assay for Large-scale SNP Genotyping Applications

Infinium Assay for Large-scale SNP Genotyping Applications

Published on: November 19, 2013

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Candidate Gene Testing in Clinical Cohort Studies with Multiplexed Genotyping and Mass Spectrometry

Published on: June 21, 2018

Area of Science:

Machine Learning
Statistical Modeling
Bioinformatics

Background:

Estimating feature importance is crucial for understanding data-driven models and the underlying data generation process.
Existing methods like Shapley additive global importance (SAGE) can be computationally intensive.
There's a need for efficient and uncertainty-aware feature importance estimation.

Purpose of the Study:

To present a Shapley-value-based framework, Sub-SAGE, for inferring individual feature importance with uncertainty.
To develop a computationally efficient method for tree-based models, avoiding resampling.
To demonstrate the framework's applicability on synthetic and large-scale genotype data.

Main Methods:

Developed Sub-SAGE, a novel feature importance estimator building on SAGE.
Utilized bootstrapping for estimating uncertainty in the Sub-SAGE estimator across model types.
Applied the framework to tree ensemble methods and large genotype datasets.

Main Results:

Sub-SAGE provides efficient feature importance estimation for tree-based models without resampling.
Bootstrapping effectively estimates uncertainty in Sub-SAGE across various model types.
Demonstrated successful application in predicting feature importance for obesity using genotype data.

Conclusions:

Sub-SAGE offers a robust and computationally efficient method for feature importance and uncertainty estimation.
The framework is valuable for interpreting complex models and understanding biological data.
This approach enhances the explainability of machine learning models in scientific research.