Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Comparing the Survival Analysis of Two or More Groups01:20

Comparing the Survival Analysis of Two or More Groups

524
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
524
Strategies for Assessing and Addressing Confounding01:25

Strategies for Assessing and Addressing Confounding

332
Confounding is a critical issue in epidemiological studies, often leading to misleading conclusions about associations between exposures and outcomes. It occurs when the relationship between the exposure and the outcome is mixed with the effects of other factors that influence the outcome. Given that, addressing confounding is of high importance for drawing accurate inferences in research.
Confounding can be addressed at both the design phase of a study and through analytical methods after data...
332
Study Design in Statistics01:15

Study Design in Statistics

9.9K
A study design is a set of techniques that allow a researcher to collect and analyze data from different variables defined for a specific research problem. Statistics is commonly for effective study design and more robust experiments,
Does aspirin reduce the risk of heart attacks? Is one brand of fertilizer more effective at growing roses than another? Is fatigue as dangerous to a driver as the influence of alcohol? Questions like these are answered using randomized experiments with proper...
9.9K
Types of Biopharmaceutical Studies: Controlled and Non-Controlled Approaches01:23

Types of Biopharmaceutical Studies: Controlled and Non-Controlled Approaches

373
Biopharmaceutical studies constitute a vital field aiming to enhance drug delivery methods and refine therapeutic approaches, drawing upon diverse interdisciplinary knowledge. In research methodologies, the choice between controlled and non-controlled studies significantly influences the study's reliability and accuracy.
Non-controlled studies, commonly employed for initial exploration, lack a control group, rendering them susceptible to biases and external influences. In contrast,...
373
Kaplan-Meier Approach01:24

Kaplan-Meier Approach

525
The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...
525
Statistical Methods for Analyzing Epidemiological Data01:25

Statistical Methods for Analyzing Epidemiological Data

854
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
854

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Semi-Supervised Topological Analysis for Elucidating Hidden Structures in High-Dimensional Transcriptome Datasets.

IEEE/ACM transactions on computational biology and bioinformatics·2019
Same author

A data science approach for the classification of low-grade and high-grade ovarian serous carcinomas.

BMC genomics·2018
Same author

GCRNN: Group-Constrained Convolutional Recurrent Neural Network.

IEEE transactions on neural networks and learning systems·2018
Same author

Correction: Performance of next-generation sequencing on small tumor specimens and/or low tumor content samples using a commercially available platform.

PloS one·2018
Same author

Performance of next-generation sequencing on small tumor specimens and/or low tumor content samples using a commercially available platform.

PloS one·2018
Same author

Tangent hyperplane kernel principal component analysis for denoising.

IEEE transactions on neural networks and learning systems·2014
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
Same journal

KASSPer: Kinase Active Site Structure Prediction using Protein and Ligand Language Models and Its Application to Virtual Screening.

Bioinformatics (Oxford, England)·2026
Same journal

IDR searcher: a search engine solution for public image resources.

Bioinformatics (Oxford, England)·2026
Same journal

KCFtools: Rapid alignment-free method for introgression screening and GWAS using k-mer profiles.

Bioinformatics (Oxford, England)·2026
Same journal

Meta2DB: Curated shotgun metagenomic feature sets and metadata for health state prediction.

Bioinformatics (Oxford, England)·2026
Same journal

conMItion: an R package adjusting confounding factors for associations in multi-omics.

Bioinformatics (Oxford, England)·2026
See all related articles

Related Experiment Video

Updated: Jan 5, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

466

Matched Forest: supervised learning for high-dimensional matched case-control studies.

Nooshin Shomal Zadeh1, Sangdi Lin2, George C Runger1

  • 1School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ 85281, USA.

Bioinformatics (Oxford, England)
|October 18, 2019
PubMed
Summary
This summary is machine-generated.

Matched Forest (MF) offers a novel approach for variable selection in high-dimensional matched case-control studies. This method effectively identifies key exposure variables and their interactions, improving upon existing techniques.

More Related Videos

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.0K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K

Related Experiment Videos

Last Updated: Jan 5, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

466
Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index
06:55

Inverse Probability of Treatment Weighting Propensity Score using the Military Health System Data Repository and National Death Index

Published on: January 8, 2020

15.0K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K

Area of Science:

  • Biostatistics
  • Epidemiology
  • Computational Biology

Background:

  • Matched case-control studies are essential in biomedical research for identifying health condition-associated exposures.
  • Traditional variable selection methods struggle with high-dimensional data and complex variable interactions.

Purpose of the Study:

  • To introduce a flexible and effective method for variable selection in high-dimensional matched case-control data.
  • To address the limitations of existing methods in detecting interaction effects.

Main Methods:

  • The study presents Matched Forest (MF), a novel algorithm based on the potential outcome model.
  • MF transforms matched case-control data by incorporating counterfactuals.
  • Variable importance is assessed using a modified score from a supervised learner.

Main Results:

  • Simulation studies demonstrate MF's efficacy in identifying significant variables.
  • The algorithm successfully detects interaction effects among variables.
  • MF is applied to biomedical data, showing competitive performance against alternative methods.

Conclusions:

  • Matched Forest provides a robust and adaptable solution for variable selection in complex epidemiological studies.
  • The method's ability to handle high-dimensional data and interactions enhances its utility in biomedical research.
  • MF is accessible through readily available software tools.