Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

Strategies for Assessing and Addressing Confounding

Strategies for Assessing and Addressing Confounding

Confounding is a critical issue in epidemiological studies, often leading to misleading conclusions about associations between exposures and outcomes. It occurs when the relationship between the exposure and the outcome is mixed with the effects of other factors that influence the outcome. Given that, addressing confounding is of high importance for drawing accurate inferences in research.
Confounding can be addressed at both the design phase of a study and through analytical methods after data...

Regression Toward the Mean

Regression Toward the Mean

Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cerebral/Cortical visual impairment (CVI) in Down syndrome: a case series.

Frontiers in human neuroscience·2025

Same author

Membrane-wide screening identifies potential tissue-specific determinants of SARS-CoV-2 tropism.

PLoS pathogens·2025

Same author

A calcium-sensing receptor allelic series and underdiagnosis of genetically driven hypocalcemia.

American journal of human genetics·2025

Same author

Improving epidemiological synthesis of postpartum complications: methodological considerations.

American journal of obstetrics and gynecology·2025

Same author

Frequency and timing of complications within the first postpartum year in the United States and Canada: a systematic review and meta-analysis.

American journal of obstetrics and gynecology·2025

Same author

Detecting clinician implicit biases in diagnoses using proximal causal inference.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2024

Same journal

Trust, Reproducibility, and Progress: The Roles of Independent Blind Prediction and Assessment and Benchmarking in Computational Biology.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026

Same journal

The Evolving Cyberinfrastructure at the National Institutes of Health to Support Data and AI in Biomedical Research.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026

Same journal

Applications of AI & ML in Biomanufacturing of Cell and Gene Therapies.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026

Same journal

AI for Health: Leveraging Artificial Intelligence to Revolutionize Healthcare.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026

Same journal

Workshop Introduction: Advances of AI Methods in Single Cell Spatial Omics.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026

Same journal

DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 17, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Improving the explainability of Random Forest classifier - user centered approach.

Dragutin Petkovic¹, Russ Altman, Mike Wong

¹Computer Science Department, San Francisco State University (SFSU), 1600 Holloway Ave., San Francisco CA 94132, USA, ²SFSU Center for Computing for Life Sciences, 1600 Holloway Ave., San Francisco, CA 94132, USA, Petkovic@sfsu.edu.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

|December 9, 2017

Summary

This summary is machine-generated.

This study introduces RFEX, a tool that simplifies complex Random Forest (RF) models for better understanding in healthcare. RFEX generates easy-to-read reports, boosting user confidence and accuracy in machine learning applications.

Related Experiment Videos

Last Updated: Feb 17, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

Area of Science:

Computational Biology
Machine Learning in Medicine
Data Science

Background:

Machine Learning (ML) methods are increasingly vital in healthcare but face adoption barriers due to their complexity and lack of explainability.
Understanding ML model decisions is crucial for validation and trust, especially for non-expert users in clinical settings.

Purpose of the Study:

To enhance the explainability of Random Forest (RF) classifiers using a novel method called RFEX.
To develop user-friendly, interpretable summary reports for trained RF models to improve adoption and validation.

Main Methods:

RFEX was developed using a user-centered approach, incorporating feedback from practitioners on explainability requirements.
The method was implemented and tested on the Stanford FEATURE dataset, using RF to predict functional sites in 3D molecules.
Formal usability testing was conducted with 13 expert and non-expert users to assess RFEX's utility.

Main Results:

RFEX significantly increased the explainability and user confidence in RF classification for the FEATURE dataset.
Analysis revealed that a small number of top-ranked features (2-6) are sufficient for achieving over 90% accuracy, even when using all 480 features.

Conclusions:

RFEX provides a valuable tool for demystifying complex ML models, particularly RF classifiers, for a broader range of users.
The findings highlight the potential for simplified feature sets to achieve high accuracy, aiding in the interpretation and application of ML in molecular biology and medicine.