Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Bootstrapping

Bootstrapping

The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Survival Tree

Survival Tree

Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
Building a Survival Tree
Constructing a...

Testing a Claim about Population Proportion

Testing a Claim about Population Proportion

A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...

Accuracy and Errors in Hypothesis Testing

Accuracy and Errors in Hypothesis Testing

Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...

Estimating Population Mean with Known Standard Deviation

Estimating Population Mean with Known Standard Deviation

To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis.

Research square·2026

Same author

Does asthma treatment influence COVID-19 severity? A comparative cohort study of SMART vs. Traditional therapy.

European clinical respiratory journal·2026

Same author

Effect of Using Personalized Estimates of Diabetes Risk During Primary Care Visits for People With Prediabetes.

Learning health systems·2026

Same author

Drivers of disparities in asthma exacerbation and healthcare utilization: a quantitative analysis of neighborhood deprivation in Southern California, United States.

Preventive medicine reports·2026

Same author

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis.

medRxiv : the preprint server for health sciences·2026

Same author

Research on the impact mechanism of environmental perception of stadium landscapes on sustainable spectatorship willingness from the perspective of embodied cognition: based on the experience model of "body-environment" interaction.

Frontiers in psychology·2026

Same journal

What do LLMs value? An evaluation framework for revealing subjective trade-offs in assessment of glycemic control.

Proceedings of machine learning research·2026

Same journal

Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift.

Proceedings of machine learning research·2026

Same journal

Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video.

Proceedings of machine learning research·2026

Same journal

Perspective: Machine Learning for Health Should Consider Social Drivers of Health.

Proceedings of machine learning research·2026

Same journal

Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression.

Proceedings of machine learning research·2026

Same journal

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 27, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data.

Ethan Harvey¹, Wansu Chen², David M Kent³

¹Department of Computer Science, Tufts University, Medford, MA, USA.

Proceedings of Machine Learning Research

|February 17, 2025

Summary

This summary is machine-generated.

This study introduces a Gaussian process model for predicting classifier accuracy improvements with increased data size. The model provides probabilistic extrapolations and uncertainty assessments, crucial for data-driven projects.

Keywords:

Gaussian process Learning curve

More Related Videos

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: May 27, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Machine Learning
Computational Statistics

Background:

Classifier development often begins with limited data, with plans for future expansion.
Accurate prediction of performance gains from increased dataset size is essential for resource allocation and project planning.

Purpose of the Study:

To propose a novel method for probabilistic extrapolation of classifier performance metrics as dataset size grows.
To address the limitations of existing deterministic extrapolation methods by incorporating uncertainty assessment.

Main Methods:

Development of a Gaussian process model to predict performance metrics (e.g., accuracy) as a function of dataset size.
Evaluation of the model's performance using error, likelihood, and coverage metrics across six diverse datasets.

Main Results:

The proposed Gaussian process model provides reliable probabilistic extrapolations of classifier performance.
The model effectively quantifies the uncertainty associated with accuracy predictions at different dataset sizes.
Empirical evaluation across six datasets demonstrates the model's robustness and generalizability.

Conclusions:

Gaussian process modeling offers a superior approach to extrapolating classifier performance compared to traditional methods.
Incorporating uncertainty in performance predictions is critical for practitioners managing data-intensive machine learning projects.
The open-source nature of this approach allows for broad applicability across various classification tasks and data modalities.