Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Confidence Intervals

Confidence Intervals

An unbiased point estimate is often insufficient to predict a population estimate, such as population mean or population proportion. In this scenario, a confidence interval is used. A confidence interval is an estimate similar to a sample proportion. However, unlike the point estimate which is a single value, the confidence interval contains a range of values. These values have lower and upper limits, known as confidence limits, and can be designated as L1 and L2, respectively.
A confidence...

Uncertainty: Confidence Intervals

Uncertainty: Confidence Intervals

The confidence interval is the range of values around the mean that contains the true mean. It is expressed as a probability percentage. The interpretation of a 95% confidence interval, for instance, is that the statistician is 95% confident that the true mean falls within the interval. The upper and lower limits of this range are known as confidence limits. The confidence limits for the true mean are estimated from the sample's mean, the standard deviation, and the statistical factor...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Innovative Clinical Trial Approach for Evaluating Digital Medical Devices Under European Fast-Track Regulatory Frameworks.

Statistics in medicine·2026

Same author

Current validation practice undermines surgical AI development.

ArXiv·2026

Same author

The exposome of brain aging across 34 countries.

Nature medicine·2026

Same author

Applying machine-learning and deep-learning to predict depression from brain MRI and identify depression-related brain biology.

Translational psychiatry·2026

Same author

Evaluation of performance measures in predictive artificial intelligence models to support medical decisions: overview and guidance.

The Lancet. Digital health·2025

Same author

Quantifying multimodal longitudinal brain changes in presymptomatic C9orf72 disease.

Alzheimer's & dementia : the journal of the Alzheimer's Association·2025

Same journal

ContiMorph: An unsupervised learning framework for cardiac motion tracking with time-continuous diffeomorphism.

Medical image analysis·2026

Same journal

MedP-CLIP: Medical CLIP with region-aware prompt integration.

Medical image analysis·2026

Same journal

Multi-organ guided diagnosis of mild cognitive impairment via hierarchical alignment and knowledge distillation.

Medical image analysis·2026

Same journal

SUDA: Simultaneous unsupervised knowledge distillation and adaptation of foundation models for efficient pathological image analysis.

Medical image analysis·2026

Same journal

Beyond the LUMIR challenge: The pathway to foundational registration models.

Medical image analysis·2026

Same journal

Annotation-efficient medical image segmentation via cross-latent graphs and vector-quantized memory.

Medical image analysis·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 5, 2026

Automated Midline Shift and Intracranial Pressure Estimation based on Brain CT Images

Automated Midline Shift and Intracranial Pressure Estimation based on Brain CT Images

Published on: April 13, 2013

Confidence intervals for performance estimates in brain MRI segmentation.

Rosana El Jurdi¹, Gaël Varoquaux², Olivier Colliot¹

¹Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié-Salpêtrière, F-75013, Paris, France.

Medical Image Analysis

|May 14, 2025

Summary

This summary is machine-generated.

Evaluating medical image segmentation models requires understanding confidence intervals. This study shows that fewer test samples are needed for segmentation than classification to achieve precise performance estimates.

Keywords:

Confidence interval Performance measure Segmentation Standard error Statistical analysis Validation

More Related Videos

Automated Segmentation of Cortical Grey Matter from T1-Weighted MRI Images

Automated Segmentation of Cortical Grey Matter from T1-Weighted MRI Images

Published on: January 7, 2019

Author Spotlight: Bridging Gaps in Anatomy and Establishing a Foundation for Algorithmic Studies

Author Spotlight: Bridging Gaps in Anatomy and Establishing a Foundation for Algorithmic Studies

Published on: December 15, 2023

Related Experiment Videos

Last Updated: May 5, 2026

Automated Midline Shift and Intracranial Pressure Estimation based on Brain CT Images

Automated Midline Shift and Intracranial Pressure Estimation based on Brain CT Images

Published on: April 13, 2013

Automated Segmentation of Cortical Grey Matter from T1-Weighted MRI Images

Automated Segmentation of Cortical Grey Matter from T1-Weighted MRI Images

Published on: January 7, 2019

Author Spotlight: Bridging Gaps in Anatomy and Establishing a Foundation for Algorithmic Studies

Author Spotlight: Bridging Gaps in Anatomy and Establishing a Foundation for Algorithmic Studies

Published on: December 15, 2023

Area of Science:

Medical image analysis
Machine learning in healthcare
Radiology and neuroimaging

Background:

Empirical evaluation of medical segmentation models is inherently noisy due to limited example images.
Reporting confidence intervals is crucial for reliable evaluation but is often omitted in medical image segmentation research.
The required test set size for accurate confidence intervals depends on performance metric spread, which differs between classification and segmentation tasks.

Purpose of the Study:

To investigate confidence interval estimation for 3D brain MRI segmentation.
To determine the necessary test set sizes for achieving desired precision in segmentation performance metrics.
To compare the sample size requirements for segmentation versus classification tasks.

Main Methods:

Experiments were conducted using the nnU-net framework on two Medical Decathlon brain MRI datasets (hippocampus and brain tumor segmentation).
The Dice Similarity Coefficient and Hausdorff distance were used as performance measures.
Parametric confidence intervals were compared against bootstrap estimates across varying test set sizes and performance metric spreads.

Main Results:

Parametric confidence intervals provide reasonable approximations to bootstrap estimates for segmentation tasks.
The test set size required for precise segmentation evaluation is often significantly smaller than for classification tasks.
Achieving a 1% confidence interval width typically requires 100-200 samples for low-spread metrics (around 3% std dev), while more complex tasks may need over 1000 samples.

Conclusions:

Confidence intervals are essential for robust evaluation of medical image segmentation models.
The sample size needed for reliable segmentation evaluation is generally lower than previously assumed, especially compared to classification.
This research provides practical insights into sample size determination for validating 3D brain MRI segmentation models.