Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bootstrapping01:24

Bootstrapping

577
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
577
Prediction Intervals01:03

Prediction Intervals

2.2K
The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y. 
2.2K
Survival Tree01:19

Survival Tree

52
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
52
Testing a Claim about Population Proportion01:24

Testing a Claim about Population Proportion

3.3K
A complete procedure for testing a claim about a population proportion is provided here.
There are two methods of testing a claim about a population proportion: (1) Using the sample proportion from the data where a binomial distribution is approximated to the normal distribution and (2) Using the binomial probabilities calculated from the data.
The first method uses normal distribution as an approximation to the binomial distribution. The requirements are as follows: sample size is large...
3.3K
Accuracy and Errors in Hypothesis Testing01:13

Accuracy and Errors in Hypothesis Testing

169
Hypothesis testing is a fundamental statistical tool that begins with the assumption that the null hypothesis H0 is true. During this process, two types of errors can occur: Type I and Type II. A Type I error refers to the incorrect rejection of a true null hypothesis, while a Type II error involves the failure to reject a false null hypothesis.
In hypothesis testing, the probability of making a Type I error, denoted as α, is commonly set at 0.05. This significance level indicates a 5%...
169
Estimating Population Mean with Known Standard Deviation01:16

Estimating Population Mean with Known Standard Deviation

8.2K
To construct a confidence interval for a single unknown population mean μ, where the population standard deviation is known, we need sample mean as an estimate for μ and we need the margin of error. Here, the margin of error (EBM) is called the error bound for a population mean (abbreviated EBM). The sample mean is the point estimate of the unknown population mean μ.
The confidence interval estimate will have the form as follows:
(point estimate - error bound, point estimate +...
8.2K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis.

Research square·2026
Same author

Does asthma treatment influence COVID-19 severity? A comparative cohort study of SMART vs. Traditional therapy.

European clinical respiratory journal·2026
Same author

Effect of Using Personalized Estimates of Diabetes Risk During Primary Care Visits for People With Prediabetes.

Learning health systems·2026
Same author

Drivers of disparities in asthma exacerbation and healthcare utilization: a quantitative analysis of neighborhood deprivation in Southern California, United States.

Preventive medicine reports·2026
Same author

Patient Versus Prediction-Level Evaluation of a Dynamic Clinical Prediction Model of Sepsis.

medRxiv : the preprint server for health sciences·2026
Same author

Research on the impact mechanism of environmental perception of stadium landscapes on sustainable spectatorship willingness from the perspective of embodied cognition: based on the experience model of "body-environment" interaction.

Frontiers in psychology·2026
Same journal

What do LLMs value? An evaluation framework for revealing subjective trade-offs in assessment of glycemic control.

Proceedings of machine learning research·2026
Same journal

Towards the Efficient Inference by Incorporating Automated Computational Phenotypes under Covariate Shift.

Proceedings of machine learning research·2026
Same journal

Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video.

Proceedings of machine learning research·2026
Same journal

Perspective: Machine Learning for Health Should Consider Social Drivers of Health.

Proceedings of machine learning research·2026
Same journal

Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression.

Proceedings of machine learning research·2026
Same journal

Does Domain-Specific Retrieval Augmented Generation Help LLMs Answer Consumer Health Questions?

Proceedings of machine learning research·2026
See all related articles

Related Experiment Video

Updated: May 27, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation
06:09

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

480

A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data.

Ethan Harvey1, Wansu Chen2, David M Kent3

  • 1Department of Computer Science, Tufts University, Medford, MA, USA.

Proceedings of Machine Learning Research
|February 17, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a Gaussian process model for predicting classifier accuracy improvements with increased data size. The model provides probabilistic extrapolations and uncertainty assessments, crucial for data-driven projects.

Keywords:
Gaussian processLearning curve

More Related Videos

Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K

Related Experiment Videos

Last Updated: May 27, 2025

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation
06:09

P300-Based Brain-Computer Interface Speller Performance Estimation with Classifier-Based Latency Estimation

Published on: September 8, 2023

480
Cross-Modal Multivariate Pattern Analysis
13:51

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

19.9K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K

Area of Science:

  • Machine Learning
  • Computational Statistics

Background:

  • Classifier development often begins with limited data, with plans for future expansion.
  • Accurate prediction of performance gains from increased dataset size is essential for resource allocation and project planning.

Purpose of the Study:

  • To propose a novel method for probabilistic extrapolation of classifier performance metrics as dataset size grows.
  • To address the limitations of existing deterministic extrapolation methods by incorporating uncertainty assessment.

Main Methods:

  • Development of a Gaussian process model to predict performance metrics (e.g., accuracy) as a function of dataset size.
  • Evaluation of the model's performance using error, likelihood, and coverage metrics across six diverse datasets.

Main Results:

  • The proposed Gaussian process model provides reliable probabilistic extrapolations of classifier performance.
  • The model effectively quantifies the uncertainty associated with accuracy predictions at different dataset sizes.
  • Empirical evaluation across six datasets demonstrates the model's robustness and generalizability.

Conclusions:

  • Gaussian process modeling offers a superior approach to extrapolating classifier performance compared to traditional methods.
  • Incorporating uncertainty in performance predictions is critical for practitioners managing data-intensive machine learning projects.
  • The open-source nature of this approach allows for broad applicability across various classification tasks and data modalities.