Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.5K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.5K
Frequency-dependent Selection01:21

Frequency-dependent Selection

21.9K
When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.
21.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Impact of Age-Related Hearing Loss on Brain Connectivity and Cognitive Performance: A Systematic Review.

Trends in hearing·2026
Same author

Fixation strength of anterior tibial tuberosity osteotomy in revision knee arthroplasty according to cerclage wire configuration: An experimental animal model.

The Knee·2026
Same author

Dissecting self-supervised learning strategies for transfer learning in MRI prostate cancer diagnosis.

Scientific reports·2026
Same author

Beyond binary classification: a pilot study of imaging-derived glioma severity modeling using T1-weighted and diffusion MRI radiomics.

Magma (New York, N.Y.)·2026
Same author

Patient complexity profiles in depression: a machine learning approach to personalized mental health.

Frontiers in psychiatry·2026
Same author

Epidemiology and severity risk factors of dengue virus infection during the 2023-2024 outbreak in Colombia.

PLoS neglected tropical diseases·2025
Same journal

Interpretable machine learning for Parkinson's disease diagnosis, staging, and biological mechanism exploration: a multicenter analysis.

BioData mining·2026
Same journal

Learning a distance for the clustering of patients with amyotrophic lateral sclerosis.

BioData mining·2026
Same journal

Multi-domain feature fusion with variational mode decomposition and hybrid LightGBM-Logistic Regression for multi-class seizure classification.

BioData mining·2026
Same journal

Large-scale transcriptomic data mining using explainable XGBoost and SHAP reveals shared biomarkers and molecular mechanisms between type-2 diabetes and triple-negative breast cancer for drug repurposing.

BioData mining·2026
Same journal

AVSeg-XAI: Deep learning framework for A/V segmentation with vascular features reveals retinal oculomics as biomarker for cardiovascular disease.

BioData mining·2026
Same journal

Navigating the uncharted: AI-driven advances in protein structure, dynamics, interactions and ligand interactions for understudied families.

BioData mining·2026
See all related articles

Related Experiment Video

Updated: Jun 13, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K

Knowledge-slanted random forest method for high-dimensional data and small sample size with a feature selection

Erika Cantor1, Sandra Guauque-Olarte2, Roberto León3

  • 1Department of clinical epidemiology and biostatistics, Pontificia Universidad Javeriana, Bogotá, 110221, Colombia. erika.cantor@javeriana.edu.co.

Biodata Mining
|September 10, 2024
PubMed
Summary
This summary is machine-generated.

We developed a knowledge-slanted random forest (RF) to improve gene selection in high-dimensional genomics data. This method integrates biological networks, enhancing prediction accuracy and explainability, especially with small sample sizes.

Keywords:
ExplainabilityFeature selectionGene selectionHigh-dimensionalPrior knowledgeProtein-protein interactionRNA-SeqRandom forest

More Related Videos

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

658
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.2K

Related Experiment Videos

Last Updated: Jun 13, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.4K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

658
Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts
08:51

Author Spotlight: Integrated Multi-Omics Analysis for Unveiling Multicellular Immune Signatures in Clinical Heart Attack Cohorts

Published on: September 20, 2024

1.2K

Area of Science:

  • Computational Biology
  • Machine Learning
  • Genomics

Background:

  • High-dimensional genetic and genomics data present challenges like the curse of dimensionality.
  • Conventional random forest (RF) models can exhibit poor accuracy in high-dimensional settings, particularly with limited sample sizes.
  • Integrating prior biological knowledge is a promising strategy to enhance machine learning model performance.

Purpose of the Study:

  • To propose a novel knowledge-slanted random forest (RF) model.
  • To improve the performance and explainability of gene selection in high-dimensional genomics data.
  • To address limitations of conventional RF in scenarios with small sample sizes.

Main Methods:

  • The knowledge-slanted RF integrates biological networks (e.g., protein-protein interaction networks) as prior knowledge.
  • A random walk with restart algorithm determines gene relevance based on network topology.
  • Gene relevance scores modify feature selection probabilities in the RF algorithm, enhanced by a modified Boruta algorithm.

Main Results:

  • Knowledge-slanted RF demonstrated improved precision in outcome prediction compared to conventional RF and logistic lasso regression on simulated datasets.
  • The method effectively identified more biologically relevant genes.
  • Enhanced explainability was observed compared to standard RF approaches.

Conclusions:

  • Knowledge-slanted RF offers a robust approach to handle high-dimensional genomics data, overcoming the curse of dimensionality.
  • The integration of prior biological network knowledge significantly boosts model performance and interpretability.
  • This method shows promise for identifying relevant genes in complex diseases, as validated in a case study of calcific aortic valve stenosis.