Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Frequency-dependent Selection

Frequency-dependent Selection

When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Impact of Age-Related Hearing Loss on Brain Connectivity and Cognitive Performance: A Systematic Review.

Trends in hearing·2026

Same author

Fixation strength of anterior tibial tuberosity osteotomy in revision knee arthroplasty according to cerclage wire configuration: An experimental animal model.

The Knee·2026

Same author

Dissecting self-supervised learning strategies for transfer learning in MRI prostate cancer diagnosis.

Scientific reports·2026

Same author

Beyond binary classification: a pilot study of imaging-derived glioma severity modeling using T1-weighted and diffusion MRI radiomics.

Magma (New York, N.Y.)·2026

Same author

Patient complexity profiles in depression: a machine learning approach to personalized mental health.

Frontiers in psychiatry·2026

Same author

Epidemiology and severity risk factors of dengue virus infection during the 2023-2024 outbreak in Colombia.

PLoS neglected tropical diseases·2025

Same journal

Interpretable machine learning for Parkinson's disease diagnosis, staging, and biological mechanism exploration: a multicenter analysis.

BioData mining·2026

Same journal

Learning a distance for the clustering of patients with amyotrophic lateral sclerosis.

BioData mining·2026

Same journal

Multi-domain feature fusion with variational mode decomposition and hybrid LightGBM-Logistic Regression for multi-class seizure classification.

BioData mining·2026

Same journal

Large-scale transcriptomic data mining using explainable XGBoost and SHAP reveals shared biomarkers and molecular mechanisms between type-2 diabetes and triple-negative breast cancer for drug repurposing.

BioData mining·2026

Same journal

AVSeg-XAI: Deep learning framework for A/V segmentation with vascular features reveals retinal oculomics as biomarker for cardiovascular disease.

BioData mining·2026

Same journal

Navigating the uncharted: AI-driven advances in protein structure, dynamics, interactions and ligand interactions for understudied families.

BioData mining·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 13, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection

Erika Cantor¹, Sandra Guauque-Olarte², Roberto León³

¹Department of clinical epidemiology and biostatistics, Pontificia Universidad Javeriana, Bogotá, 110221, Colombia. erika.cantor@javeriana.edu.co.

|September 10, 2024

Summary

This summary is machine-generated.

We developed a knowledge-slanted random forest (RF) to improve gene selection in high-dimensional genomics data. This method integrates biological networks, enhancing prediction accuracy and explainability, especially with small sample sizes.

Keywords:

Explainability Feature selection Gene selection High-dimensional Prior knowledge Protein-protein interaction RNA-Seq Random forest

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

Related Experiment Videos

Last Updated: Jun 13, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

Area of Science:

Computational Biology
Machine Learning
Genomics

Background:

High-dimensional genetic and genomics data present challenges like the curse of dimensionality.
Conventional random forest (RF) models can exhibit poor accuracy in high-dimensional settings, particularly with limited sample sizes.
Integrating prior biological knowledge is a promising strategy to enhance machine learning model performance.

Purpose of the Study:

To propose a novel knowledge-slanted random forest (RF) model.
To improve the performance and explainability of gene selection in high-dimensional genomics data.
To address limitations of conventional RF in scenarios with small sample sizes.

Main Methods:

The knowledge-slanted RF integrates biological networks (e.g., protein-protein interaction networks) as prior knowledge.
A random walk with restart algorithm determines gene relevance based on network topology.
Gene relevance scores modify feature selection probabilities in the RF algorithm, enhanced by a modified Boruta algorithm.

Main Results:

Knowledge-slanted RF demonstrated improved precision in outcome prediction compared to conventional RF and logistic lasso regression on simulated datasets.
The method effectively identified more biologically relevant genes.
Enhanced explainability was observed compared to standard RF approaches.

Conclusions:

Knowledge-slanted RF offers a robust approach to handle high-dimensional genomics data, overcoming the curse of dimensionality.
The integration of prior biological network knowledge significantly boosts model performance and interpretability.
This method shows promise for identifying relevant genes in complex diseases, as validated in a case study of calcific aortic valve stenosis.