Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Randomized Experiments01:13

Randomized Experiments

9.1K
The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...
9.1K
Survival Tree01:19

Survival Tree

440
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
440
Variation01:19

Variation

8.1K
An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...
8.1K
Strategies for Assessing and Addressing Confounding01:25

Strategies for Assessing and Addressing Confounding

447
Confounding is a critical issue in epidemiological studies, often leading to misleading conclusions about associations between exposures and outcomes. It occurs when the relationship between the exposure and the outcome is mixed with the effects of other factors that influence the outcome. Given that, addressing confounding is of high importance for drawing accurate inferences in research.
Confounding can be addressed at both the design phase of a study and through analytical methods after data...
447
Regression Toward the Mean01:52

Regression Toward the Mean

7.2K
Regression toward the mean (“RTM”) is a phenomenon in which extremely high or low values—for example, and individual’s blood pressure at a particular moment—appear closer to a group’s average upon remeasuring. Although this statistical peculiarity is the result of random error and chance, it has been problematic across various medical, scientific, financial and psychological applications. In particular, RTM, if not taken into account, can interfere when...
7.2K
Regression Analysis01:11

Regression Analysis

8.5K
Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
8.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Cerebral/Cortical visual impairment (CVI) in Down syndrome: a case series.

Frontiers in human neuroscience·2025
Same author

Membrane-wide screening identifies potential tissue-specific determinants of SARS-CoV-2 tropism.

PLoS pathogens·2025
Same author

A calcium-sensing receptor allelic series and underdiagnosis of genetically driven hypocalcemia.

American journal of human genetics·2025
Same author

Improving epidemiological synthesis of postpartum complications: methodological considerations.

American journal of obstetrics and gynecology·2025
Same author

Frequency and timing of complications within the first postpartum year in the United States and Canada: a systematic review and meta-analysis.

American journal of obstetrics and gynecology·2025
Same author

Detecting clinician implicit biases in diagnoses using proximal causal inference.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2024
Same journal

Trust, Reproducibility, and Progress: The Roles of Independent Blind Prediction and Assessment and Benchmarking in Computational Biology.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

The Evolving Cyberinfrastructure at the National Institutes of Health to Support Data and AI in Biomedical Research.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

Applications of AI & ML in Biomanufacturing of Cell and Gene Therapies.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

AI for Health: Leveraging Artificial Intelligence to Revolutionize Healthcare.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

Workshop Introduction: Advances of AI Methods in Single Cell Spatial Omics.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same journal

DRIVE-KG: Enhancing variant-phenotype association discovery in understudied complex diseases using heterogeneous knowledge graphs.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
See all related articles

Related Experiment Video

Updated: Feb 17, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

768

Improving the explainability of Random Forest classifier - user centered approach.

Dragutin Petkovic1, Russ Altman, Mike Wong

  • 1Computer Science Department, San Francisco State University (SFSU), 1600 Holloway Ave., San Francisco CA 94132, USA, ²SFSU Center for Computing for Life Sciences, 1600 Holloway Ave., San Francisco, CA 94132, USA, Petkovic@sfsu.edu.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
|December 9, 2017
PubMed
Summary
This summary is machine-generated.

This study introduces RFEX, a tool that simplifies complex Random Forest (RF) models for better understanding in healthcare. RFEX generates easy-to-read reports, boosting user confidence and accuracy in machine learning applications.

Related Experiment Videos

Last Updated: Feb 17, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

768

Area of Science:

  • Computational Biology
  • Machine Learning in Medicine
  • Data Science

Background:

  • Machine Learning (ML) methods are increasingly vital in healthcare but face adoption barriers due to their complexity and lack of explainability.
  • Understanding ML model decisions is crucial for validation and trust, especially for non-expert users in clinical settings.

Purpose of the Study:

  • To enhance the explainability of Random Forest (RF) classifiers using a novel method called RFEX.
  • To develop user-friendly, interpretable summary reports for trained RF models to improve adoption and validation.

Main Methods:

  • RFEX was developed using a user-centered approach, incorporating feedback from practitioners on explainability requirements.
  • The method was implemented and tested on the Stanford FEATURE dataset, using RF to predict functional sites in 3D molecules.
  • Formal usability testing was conducted with 13 expert and non-expert users to assess RFEX's utility.

Main Results:

  • RFEX significantly increased the explainability and user confidence in RF classification for the FEATURE dataset.
  • Analysis revealed that a small number of top-ranked features (2-6) are sufficient for achieving over 90% accuracy, even when using all 480 features.

Conclusions:

  • RFEX provides a valuable tool for demystifying complex ML models, particularly RF classifiers, for a broader range of users.
  • The findings highlight the potential for simplified feature sets to achieve high accuracy, aiding in the interpretation and application of ML in molecular biology and medicine.