Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Statistical Methods for Analyzing Epidemiological Data

Statistical Methods for Analyzing Epidemiological Data

Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:

Statistical Software for Data Analysis and Clinical Trials

Statistical Software for Data Analysis and Clinical Trials

Statistical software is pivotal in data analysis and clinical trials by providing tools to analyze data, draw conclusions, and make predictions. These software packages range from simple data management applications to complex analytical platforms, supporting various statistical tests, models, and simulation techniques. Their significance lies in their ability to handle vast amounts of data with precision and efficiency, enabling researchers to validate hypotheses, identify trends, and make...

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Random Sampling Method

Random Sampling Method

Sampling is a technique to select a portion (or subset) of the larger population and study that portion (the sample) to gain information about the population. Data are the result of sampling from a population. The sampling method ensures that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest. Among the various sampling methods used by...

Wald-Wolfowitz Runs Test I

Wald-Wolfowitz Runs Test I

The Wald-Wolfowitz test, also known as the runs test, is a nonparametric statistical test used to assess the randomness of a sequence of two different types of elements (e.g., positive/negative values, successes/failures). It examines whether the order of the elements in a sequence is random or if there is a pattern or trend present. This nonparametric test applies to any ordered data despite the population and sample data distribution, even if a higher sample size is available.
The test works...

Randomized Experiments

Randomized Experiments

The randomization process involves assigning study participants randomly to experimental or control groups based on their probability of being equally assigned. Randomization is meant to eliminate selection bias and balance known and unknown confounding factors so that the control group is similar to the treatment group as much as possible. A computer program and a random number generator can be used to assign participants to groups in a way that minimizes bias.
Simple randomization
Simple...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026

Same author

Impact of Imaging Protocols on Thermal Detection of Pressure Injuries: Threshold versus Deep Learning Across Skin Tones.

medRxiv : the preprint server for health sciences·2026

Same author

Correction: Impact of skin tone, environmental, and technical factors on thermal imaging.

PloS one·2026

Same author

GDA-AM: ON THE EFFECTIVENESS OF SOLVING MIN-IMAX OPTIMIZATION VIA ANDERSON MIXING.

... International Conference on Learning Representations·2026

Same author

Beyond Composite Indices: Comprehensive Social Determinants Improve Heart Failure Readmission Prediction.

Journal of the American Heart Association·2026

Same author

Industry payments to cardiologists are associated with higher Medicare spending.

The American journal of managed care·2026

Same journal

Variational Learning of Individual Survival Distributions.

Proceedings of the ACM Conference on Health, Inference, and Learning·2022

Same journal

Deidentification of free-text medical records using pre-trained bidirectional transformers.

Proceedings of the ACM Conference on Health, Inference, and Learning·2021

Same journal

Multiple Instance Learning for Predicting Necrotizing Enterocolitis in Premature Infants Using Microbiome Data.

Proceedings of the ACM Conference on Health, Inference, and Learning·2021

Same journal

MMiDaS-AE: Multi-modal Missing Data aware Stacked Autoencoder for Biomedical Abstract Screening.

Proceedings of the ACM Conference on Health, Inference, and Learning·2021

Same journal

TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.

Proceedings of the ACM Conference on Health, Inference, and Learning·2021

Same journal

Adverse Drug Reaction Discovery from Electronic Health Records with Deep Neural Networks.

Proceedings of the ACM Conference on Health, Inference, and Learning·2020

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 26, 2025

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

CaliForest: Calibrated Random Forest for Health Data.

Yubin Park¹, Joyce C Ho²

¹Emory University Bonsai Research, LLC.

Proceedings of the ACM Conference on Health, Inference, and Learning

|July 26, 2021

Summary

This summary is machine-generated.

CaliForest improves risk prediction models by enhancing calibration without needing extra data. This new method ensures more accurate healthcare predictions for personalized medicine.

Keywords:

Applied computing→Health informatics Bagging Computing methodologies→Classification and regression trees General and reference→Empirical studies calibration healthcare python random forest

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Oct 26, 2025

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Healthcare Analytics
Machine Learning in Medicine
Biostatistics

Background:

Predictive models in healthcare require evaluation of both discrimination and calibration.
Calibration, the accuracy of risk estimates, is often neglected in favor of discrimination.
Accurate calibration is vital for personalized medicine and clinical decision-making.

Purpose of the Study:

To introduce CaliForest, a novel calibrated random forest algorithm.
To address the common neglect of calibration in healthcare predictive modeling.
To provide a method that avoids explicit calibration sets by using out-of-bag samples.

Main Methods:

Developed CaliForest, a random forest algorithm incorporating calibration.
Utilized out-of-bag samples within the random forest framework for calibration.
Evaluated CaliForest on two binary risk prediction tasks using the MIMIC-III database.

Main Results:

CaliForest achieved comparable discrimination to standard random forest.
CaliForest demonstrated superior model calibration across six different metrics.
The proposed method effectively integrated calibration into random forest models.

Conclusions:

CaliForest offers a robust solution for improving the calibration of random forest models in healthcare.
The method enhances the reliability of risk predictions for personalized medicine.
Open-source availability facilitates adoption and further research in calibrated machine learning for health.