Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Sample Size Calculation01:19

Sample Size Calculation

6.2K
Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...
6.2K
Margin of Error01:27

Margin of Error

6.8K
The margin of error is also called the maximum error of an estimate. The margin of error is the maximum possible or expected difference between the observed sample parameter value and the actual population parameter value. For proportion, it is the maximum difference between the value of sample proportion obtained from the data and the true value of population proportion. As the true value of the population parameter is not known, the margin of error is calculated using the sample statistic.
6.8K
Estimating Population Mean with Unknown Standard Deviation01:22

Estimating Population Mean with Unknown Standard Deviation

8.7K
In practice, we rarely know the population standard deviation. In the past, when the sample size was large, this did not present a problem to statisticians. They used the sample standard deviation s as an estimate for σ and proceeded as before to calculate a confidence interval with close enough results. However, statisticians ran into problems when the sample size was small. A small sample size caused inaccuracies in the confidence interval.
William S. Gosset (1876–1937) of the...
8.7K
Contaminants and Errors01:16

Contaminants and Errors

330
Effective sample preparation is crucial for accurate and reliable laboratory analysis. During this process, two significant sources of error can arise: concentration bias from improper sample splitting and contamination caused by methods used to reduce particle size, such as grinding or homogenization. Identifying and minimizing these potential errors is crucial to ensuring the validity of the analysis.
Another key consideration is determining the appropriate number of samples required to...
330
Uncertainty: Confidence Intervals00:54

Uncertainty: Confidence Intervals

10.1K
The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...
10.1K
Prediction Intervals01:03

Prediction Intervals

3.1K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
3.1K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Referral pathways and duration of care for musculoskeletal complaints across Europe: an analysis of primary and secondary care.

Annals of the rheumatic diseases·2026
Same author

Increased prevalence of postpartum haemorrhage in pregnancies resulting from oocyte donation compared with IVF or natural conception: a systematic review and meta-analysis.

Reproductive biomedicine online·2026
Same author

Attrition and representativeness in development and validation of online symptom checkers-a case study on the <i>Rheumatic</i>? Questionnaire.

Frontiers in artificial intelligence·2026
Same author

Bacteremia prediction models to reduce unnecessary blood cultures: external validation in a large US emergency department.

Scientific reports·2026
Same author

Generational changes in cardiometabolic disease incidence by risk factor strata in the UK Biobank.

Preventive medicine·2026
Same author

Discontinuation of Levothyroxine in Adults Aged 60 Years or Older.

JAMA·2026
Same journal

Predicting the risk of serious muscle disorders in individuals eligible for statin treatment in England: derivation and validation of a clinical prediction model.

The Lancet. Digital health·2026
Same journal

Ensuring the clinical impact of medical artificial intelligence.

The Lancet. Digital health·2026
Same journal

Precision medicine's inevitable trajectory toward rare-disease-sized cohorts: implications for machine learning and deep learning.

The Lancet. Digital health·2026
Same journal

Artificial intelligence-based retinal imaging for brain health assessment: a scoping review.

The Lancet. Digital health·2026
Same journal

Digital demands for chronic disease research and management.

The Lancet. Digital health·2026
Same journal

Large language models as experimental systems in human psychopathology: a modelling study.

The Lancet. Digital health·2026
See all related articles

Related Experiment Video

Updated: Jan 9, 2026

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.5K

Effective sample size for individual risk predictions: quantifying uncertainty in machine learning models.

Doranne Thomassen1, Toby Hackmann1, Jelle Goeman1

  • 1Department of Biomedical Data Sciences, Leiden University Medical Center, Leiden, Netherlands.

The Lancet. Digital Health
|November 29, 2025
PubMed
Summary
This summary is machine-generated.

Clinical prediction models can have varying uncertainty for individual patients, impacting fairness. We developed a method to estimate effective sample size, revealing significant prediction uncertainty even in large datasets, crucial for communicating risk effectively.

More Related Videos

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.6K
Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.7K

Related Experiment Videos

Last Updated: Jan 9, 2026

An R-Based Landscape Validation of a Competing Risk Model
05:37

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

2.5K
Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma
04:09

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

8.6K
Establishing a Competing Risk Regression Nomogram Model for Survival Data
04:57

Establishing a Competing Risk Regression Nomogram Model for Survival Data

Published on: October 23, 2020

10.7K

Area of Science:

  • Clinical prediction modeling
  • Machine learning in healthcare
  • Statistical uncertainty quantification

Background:

  • Standard performance metrics for clinical prediction models do not adequately capture individual prediction uncertainty.
  • This lack of uncertainty assessment raises concerns about fairness, as models may be more certain for some patients than others.
  • Effective sample size has been proposed as a metric to quantify sampling uncertainty.

Purpose of the Study:

  • To develop and illustrate a computational method for estimating effective sample sizes across diverse prediction models.
  • To assess the utility of effective sample size in understanding individual prediction uncertainty in a large clinical dataset.
  • To explore the implications of effective sample size for communicating risk prediction uncertainty.

Main Methods:

  • A computational method was developed to estimate effective sample sizes for various prediction models, including logistic regression, elastic net, XGBoost, neural network, and random forest.
  • The method was applied to a clinical dataset comprising 23,034 individuals.
  • Simulations were conducted to evaluate the accuracy of the effective sample size estimates for different model types.

Main Results:

  • The developed method accurately estimated effective sample sizes for logistic regression and elastic net models, with minor deviations for XGBoost, neural network, and random forest.
  • Despite similar overall model performance metrics, substantial variations in effective sample sizes and patient-specific risk predictions were observed.
  • Individual prediction uncertainty was found to be significant, even when models were trained on large sample sizes.

Conclusions:

  • Individual prediction uncertainty in clinical models can be substantial, irrespective of the dataset size.
  • Effective sample size is a valuable measure for quantifying and communicating the uncertainty associated with individual risk predictions.
  • This approach holds promise for improving the transparency and fairness of machine learning-based prediction models in clinical practice.