Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Bootstrapping01:24

Bootstrapping

778
The term "bootstrap" originated in the 19th century as a metaphor for self-improvement or achieving something independently, without external assistance. This concept extends to statistical bootstrapping, a self-contained method for estimating population parameters through resampling, even though it can be computationally intensive. Developed by the American statistician Dr. Bradley Efron in 1979, bootstrapping provides a robust way to perform inference when the original sample size is...
778
Data Validation01:15

Data Validation

529
Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:
529
Survival Tree01:19

Survival Tree

358
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
358
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

6.5K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
6.5K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

3.4K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
3.4K
One-Way ANOVA: Equal Sample Sizes01:15

One-Way ANOVA: Equal Sample Sizes

3.9K
One-Way ANOVA can be performed on three or more samples with equal or unequal sample sizes. When one-way ANOVA is performed on two datasets with samples of equal sizes, it can be easily observed that the computed F statistic is highly sensitive to the sample mean.
Different sample means can result in different values for the variance estimate: variance between samples. This is because the variance between samples is calculated as the product of the sample size and the variance between the...
3.9K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

"Our Existence as Whole Individuals Does Not Evaporate at Age 50": A Mixed Methods Study of Autistic Adults' Perspectives on Growing Older.

Autism in adulthood·2026
Same author

Navigating the Noise: Strategies Used by Autistic Adults to Manage Difficult Listening Situations.

Journal of speech, language, and hearing research : JSLHR·2026
Same author

The influence of motor and non-motor characteristics of Parkinson's disease on motor imagery vividness.

Journal of neuropsychology·2026
Same author

How do autistic adults experience ageing? A qualitative interview study.

Autism : the international journal of research and practice·2026
Same author

What do public contributors with lived experience know and think about open research? 'Nobody should look at results and think "how did they arrive at that?"'.

Journal of neuropsychology·2026
Same author

Improving eye care access for autistic people: applying the autistic SPACE framework.

Clinical & experimental optometry·2026
Same journal

Invaders taking over-Mollusc faunal change in volcanic barrier lakes of the Albertine Rift biodiversity hotspot.

PloS one·2026
Same journal

AI-driven molecular diversification and ligand-based optimization of macitentan derivatives targeting VEGFR1 and endothelin signaling pathways.

PloS one·2026
Same journal

Performance patterns and records in the world aquatics masters championships: Where do the most frequently represented nations among the top-ten masters swimmers come from?

PloS one·2026
Same journal

Modeling diurnal Temperature-Rainfall relationships under multicollinearity using PLS-SEM: A case study of Ghana.

PloS one·2026
Same journal

Organizational culture, social capital, and emergency capacity in primary healthcare institutions: A cross-sectional structural equation modeling study comparing ordinary and older communities.

PloS one·2026
Same journal

Impact of kidney function on the metabolome in the general population.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Jan 4, 2026

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.3K

Machine learning algorithm validation with a limited sample size.

Andrius Vabalas1, Emma Gowen2, Ellen Poliakoff2

  • 1Materials, Devices and Systems Division, School of Electrical and Electronic Engineering, The University of Manchester, Manchester, England, United Kingdom.

Plos One
|November 8, 2019
PubMed
Summary
This summary is machine-generated.

Machine learning (ML) performance estimates are biased with small sample sizes. Nested cross-validation (CV) and train/test splits provide unbiased estimates, unlike K-fold CV, especially when feature selection is done on pooled data.

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

11.1K

Related Experiment Videos

Last Updated: Jan 4, 2026

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.3K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.9K
Design and Analysis for Fall Detection System Simplification
08:05

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

11.1K

Area of Science:

  • Machine learning
  • Biostatistics
  • Neuroscience

Background:

  • High-dimensional datasets with few samples are common in human participant studies.
  • Small sample sizes can lead to biased machine learning performance estimates.
  • Previous studies show small sample sizes correlate with higher classification accuracy in autism prediction.

Purpose of the Study:

  • Investigate bias in machine learning performance estimates due to small sample sizes.
  • Evaluate the impact of different validation methods on bias.
  • Identify robust methodologies for small dataset analysis.

Main Methods:

  • Simulations were used to assess bias in machine learning validation methods.
  • K-fold Cross-Validation (CV), Nested CV, and train/test split were compared.
  • The influence of feature selection, data dimensionality, and hyper-parameter space was explored.

Main Results:

  • K-fold CV yields biased performance estimates, even with up to 1000 samples.
  • Nested CV and train/test split offer unbiased estimates irrespective of sample size.
  • Feature selection on pooled data significantly increases bias compared to parameter tuning.

Conclusions:

  • Standard K-fold CV is unreliable for small, high-dimensional datasets.
  • Nested CV and train/test splits are recommended for robust machine learning validation.
  • Careful consideration of validation and feature selection methods is crucial for accurate results.