Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Sample Size Calculation

Sample Size Calculation

Knowledge of the sample size is the first requirement to conduct random sampling or an experiment. The sample size is the total number of units, observations, or groups (in some cases) used to get the data to estimate a population parameter. As the name suggests, the sample size is that of the sample drawn from the population and differs from the population size.
The sample size for the given experiment or sampling effort is fundamental to any study design. Sample size decides the number of...

Sample Proportion and Population Proportion

Sample Proportion and Population Proportion

Collecting samples or responses from an entire population takes significant time and effort, so a researcher collects responses from only a sample of that population. Suppose a study needs to collect information about a specific mobile application. After sample collection, the researcher analyzes the data and discovers that most individuals in the sample use that specific mobile application. The sample proportion measures the number of individuals in a sample who either use or don't use the...

Central Limit Theorem

Central Limit Theorem

The central limit theorem, abbreviated as clt, is one of the most powerful and useful ideas in all of statistics. The central limit theorem for sample means says that if you repeatedly draw samples of a given size and calculate their means, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. In other words, as sample sizes increase, the distribution of means follows the normal distribution more closely.
The sample size, n, that...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Medication-Wide Association Study of Alzheimer's Disease and Related Dementias: Identifying Drug Candidates from Electronic Health Records through Explainable AI.

medRxiv : the preprint server for health sciences·2026

Same author

Cognitive Trajectories After Major Surgery in Older Adults and Factors Associated With Severe Decline.

Journal of the American Geriatrics Society·2026

Same author

Characteristics and Outcomes of Over 1 Million Veterans With Heart Failure Phenotyped Using Artificial Intelligence Approaches: the National DCVA-HF Registry.

Journal of cardiac failure·2026

Same author

Beware the Little Foxes that Spoil the Vines: Small Inconsistencies in Clinical Data Can Distort Machine Learning Findings.

Fortune journal of health sciences·2026

Same author

Exercise cardiac magnetic resonance biventricular volumetric reserve in heart failure with preserved ejection fraction.

European journal of heart failure·2026

Same author

Target-Dose Versus Below-Target-Dose ACE Inhibitors and Lower Risk of Kidney Failure in U.S. Veterans with HFrEF.

European journal of heart failure·2026

Same journal

Established machine learning matches tabular foundation models in clinical predictions.

BMC medical informatics and decision making·2026

Same journal

Explainable AI machine learning framework for chronic kidney disease prediction utilizing electronic health records.

BMC medical informatics and decision making·2026

Same journal

Interpretable SHAP-based machine learning framework for patient satisfaction prediction: a case study in Thammasat University Hospital.

BMC medical informatics and decision making·2026

Same journal

Automated generation of structured breast ultrasound reports using BreastViT and ChatGPT.

BMC medical informatics and decision making·2026

Same journal

Shared decision-making and medication adherence among community adults with chronic diseases: a cross-sectional study in Hubei Province, China.

BMC medical informatics and decision making·2026

Same journal

Classification of periapical radiographic findings for root canal therapy decision support using deep neural networks.

BMC medical informatics and decision making·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 24, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Predicting sample size required for classification performance.

Rosa L Figueroa¹, Qing Zeng-Treitler, Sasikiran Kandula

¹Dep. Ing. Eléctrica, Facultad de Ingeniería, Universidad de Concepción, Concepción, Chile.

BMC Medical Informatics and Decision Making

|February 17, 2012

Summary

This summary is machine-generated.

Estimating annotated sample size is crucial for supervised learning. This study introduces a weighted curve fitting method that accurately predicts performance, outperforming un-weighted approaches for efficient model development.

More Related Videos

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Related Experiment Videos

Last Updated: May 24, 2026

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Automatic Image Processing to Determine the Community Size Structure of Riverine Macroinvertebrates

Published on: January 13, 2023

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Predicting Treatment Response to Image-Guided Therapies Using Machine Learning: An Example for Trans-Arterial Treatment of Hepatocellular Carcinoma

Published on: October 10, 2018

Area of Science:

Machine Learning
Data Science
Computational Biology

Background:

Supervised learning requires annotated data, which is scarce and costly.
Accurate estimation of required annotated sample size is needed for both passive and active learning strategies.

Purpose of the Study:

To develop and evaluate a novel method for predicting the necessary sample size for supervised machine learning models.
To improve the efficiency of model development by accurately estimating annotation needs.

Main Methods:

An inverse power law model was fitted to learning curves using nonlinear weighted least squares optimization.
The fitted model predicted classifier performance and confidence intervals for larger sample sizes.
The method was evaluated on clinical text and waveform classification tasks, comparing weighted and un-weighted fitting.

Main Results:

The weighted fitting method accurately predicted model performance across various datasets and sampling strategies.
Between 80 to 560 annotated samples were sufficient to achieve low error rates (MSE < 0.01).
The weighted fitting approach demonstrated statistically significant improvement over the un-weighted method (p < 0.05).

Conclusions:

A simple, effective algorithm for sample size prediction in supervised machine learning was developed.
The weighted fitting algorithm offers superior performance compared to un-weighted methods.
This tool aids researchers in determining optimal annotation sample sizes, enhancing resource allocation.