Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

5.6K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
5.6K
Random Error01:04

Random Error

10.0K
Random or indeterminate errors originate from various uncontrollable variables, such as variations in environmental conditions, instrument imperfections, or the inherent variability of the phenomena being measured. Usually, these errors cannot be predicted, estimated, or characterized because their direction and magnitude often vary in magnitude and direction even during consecutive measurements. As a result, they are difficult to eliminate. However, the aggregate effect of these errors can be...
10.0K
Prediction Intervals01:03

Prediction Intervals

3.5K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
3.5K
Estimating Population Standard Deviation01:26

Estimating Population Standard Deviation

3.5K
When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...
3.5K
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

9.0K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
9.0K
Estimating Population Mean with Known Standard Deviation01:16

Estimating Population Mean with Known Standard Deviation

9.8K
To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...
9.8K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

In Reply.

Deutsches Arzteblatt international·2026
Same author

Interpretation of Pharmacometabolomics Results: Fingerprint of Drug Exposure or Confounder Effects? Insights from a Urinary Metabolomics Study with Voriconazole in Healthy Participants.

International journal of molecular sciences·2026
Same author

The phenotypic spectrum and genetic determinants of severe spinal muscular atrophy in individuals with a single <i>SMN2</i> copy: an international retrospective observational study.

EClinicalMedicine·2026
Same author

Urinary Metabolomics Predict Acute Kidney Injury in Very-Low-Birth-Weight Infants with Patent Ductus Arteriosus.

Biomolecules·2026
Same author

Confidence Intervals for Comparing Two Independent Folded Normals: A Case Study in Bunion Surgery.

Statistics in medicine·2026
Same author

Emulated Effects of Glucagon-Like Peptide 1 Receptor Agonist Therapy in the General Population.

Journal of the American College of Cardiology·2026
Same journal

A Mixture of Distributed Lag Non-Linear Models to Account for Spatially Heterogeneous Exposure-Lag-Response Associations.

Statistics in medicine·2026
Same journal

Practical Considerations for Gaussian Process Modeling for Causal Inference in Quasi-Experimental Studies With Panel Data.

Statistics in medicine·2026
Same journal

Covariate Adjustment for Wilcoxon Two Sample Statistic and Test.

Statistics in medicine·2026
Same journal

Beyond Fixed Thresholds: Optimizing Summaries of Wearable Device Data via Piecewise Linearization of Quantile Functions.

Statistics in medicine·2026
Same journal

A Causal Framework for Evaluating the Total Effect of Strategies Aiming to Expand Screening and to Improve Outcomes.

Statistics in medicine·2026
Same journal

Causal Effects on Nonterminal Event Time With Application to Antibiotic Usage and Future Resistance.

Statistics in medicine·2026
See all related articles

Related Experiment Video

Updated: Mar 22, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

846

Calibrating random forests for probability estimation.

Theresa Dankowski1, Andreas Ziegler1,2,3,4

  • 1Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Lübeck, Germany.

Statistics in Medicine
|April 15, 2016
PubMed
Summary
This summary is machine-generated.

This study introduces two methods for updating random forests to improve probability estimation. A new logistic regression-based approach for random forests outperformed a general method when its assumptions were unmet.

Keywords:
calibrationlogistic regressionprobability estimationprobability machinerandom forestsupdating

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.7K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K

Related Experiment Videos

Last Updated: Mar 22, 2026

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model
07:13

Comparison of Predictive Performance of Three Lymph Node Staging Systems in Colorectal Signet Ring Cell Carcinoma Based on Machine Learning Model

Published on: April 18, 2025

846
An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.7K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K

Area of Science:

  • Machine Learning
  • Statistics
  • Medical Informatics

Background:

  • Random forests are effective for consistent probability estimation.
  • Updating random forests for new data or settings remains a challenge.
  • Existing methods for updating probability machines may have limitations.

Purpose of the Study:

  • To present and compare two novel strategies for updating random forests for probability estimation.
  • To evaluate the performance of these updating methods in simulation and real-world data.
  • To identify the optimal approach for updating random forests based on specific assumptions.

Main Methods:

  • Developed a new logistic regression-based re-calibration strategy specifically for random forests.
  • Translated random forests to logistic regression models using terminal nodes representing conditional probabilities.
  • Compared the new method with Elkan's general approach for updating probability machines.

Main Results:

  • Both updating strategies demonstrated improvements in probability estimation.
  • The logistic regression-based approach outperformed Elkan's method when its strict assumptions were not met.
  • The new method showed superior performance on data from the German Stroke Study Collaboration.

Conclusions:

  • The logistic regression-based approach is a preferable strategy for updating random forests for probability estimation, especially when Elkan's method assumptions are violated.
  • Elkan's method offers general applicability but may be less effective under certain conditions.
  • The developed method provides a robust alternative for dynamic probability estimation using random forests.