Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Data Validation

Data Validation

Method validation is a crucial process in analytical chemistry designed to confirm that a given method consistently produces reliable and high-quality results. This process is essential when a method is applied to different sample matrices or when procedural modifications are made, ensuring that the results meet acceptable standards across various applications.
Key parameters for method validation include:

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

Sensitivity, Specificity, and Predicted Value

Sensitivity, Specificity, and Predicted Value

In healthcare diagnostics, laboratory tests play a crucial role in identifying and diagnosing a wide range of medical conditions. However, interpreting test results is not always straightforward. An abnormal test result does not always confirm the presence of a disease, just as a normal result does not guarantee its absence. To assess the reliability of these diagnostic tools, healthcare practitioners rely on two key statistical indicators: sensitivity and specificity.
Sensitivity is the...

Estimating Population Standard Deviation

Estimating Population Standard Deviation

When the population standard deviation is unknown and the sample size is large, the sample standard deviation s is commonly used as a point estimate of σ. However, it can sometimes under or overestimate the population standard deviation. To overcome this drawback, confidence intervals are determined to estimate population parameters and eliminate any calculation bias accurately. However, this only applies to random samples from normally distributed populations. Knowing the sample mean and...

Goodness-of-Fit Test

Goodness-of-Fit Test

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is given as...

Variation

Variation

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation, which is the square root of variance.
When independent and dependent variables are plotted on a scatter plot, the slope of a line is a value that describes the rate of change between the two...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Cell Cycle Sensing Shapes Human T Cell Fate and Exhaustion Programs.

bioRxiv : the preprint server for biology·2026

Same author

Wavelet Decomposition-Based Genomic Analysis of the Human Electrocardiogram.

medRxiv : the preprint server for health sciences·2026

Same author

Quantifying Anterior Cruciate Ligament Injury Resilience: A Screening and Composite Score Framework.

Orthopaedic journal of sports medicine·2026

Same author

Structure-preserving multivariate hypothesis testing for mass spectrometry imaging and single-cell data.

Bioinformatics (Oxford, England)·2026

Same author

Temporal and spatial composition of the tumor microenvironment predicts response to immune checkpoint inhibition in metastatic TNBC.

Nature cancer·2026

Same author

Prognostic pan-cancer and single-cancer models: A large-scale analysis using a real-world clinico-genomic database.

PloS one·2026

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 12, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Cross-validation: what does it estimate and how well does it do it?

Stephen Bates¹, Trevor Hastie², Robert Tibshirani³

¹Depts. of Statistics and EECS, Univ. of California, Berkeley.

Journal of the American Statistical Association

|September 23, 2024

Summary

This summary is machine-generated.

Cross-validation estimates average prediction error on new data, not the current model. Nested cross-validation improves confidence intervals for prediction accuracy, especially with data splitting.

More Related Videos

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Design and Evaluation of Smart Glasses for Food Intake and Physical Activity Classification

Design and Evaluation of Smart Glasses for Food Intake and Physical Activity Classification

Published on: February 14, 2018

Related Experiment Videos

Last Updated: Jun 12, 2025

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

Cross-Modal Multivariate Pattern Analysis

Cross-Modal Multivariate Pattern Analysis

Published on: November 9, 2011

Design and Evaluation of Smart Glasses for Food Intake and Physical Activity Classification

Design and Evaluation of Smart Glasses for Food Intake and Physical Activity Classification

Published on: February 14, 2018

Area of Science:

Statistics
Machine Learning
Computational Statistics

Background:

Cross-validation is a standard method for estimating prediction error in machine learning.
Its precise behavior and interpretation, particularly for linear models, are not fully understood.
Existing methods may provide misleading estimates of prediction error and confidence intervals.

Purpose of the Study:

To clarify what prediction error cross-validation truly estimates.
To investigate the accuracy of confidence intervals derived from cross-validation.
To propose improved methods for reliable prediction error estimation.

Main Methods:

Theoretical analysis of cross-validation for linear models.
Empirical evaluation of various prediction error estimation techniques, including data splitting and bootstrapping.
Development and testing of a nested cross-validation scheme.

Main Results:

Cross-validation estimates the average prediction error across different training sets, not for the specific model trained on the current data.
Standard confidence intervals derived from cross-validation often exhibit inadequate coverage.
Nested cross-validation provides more accurate variance estimates and reliable confidence intervals.

Conclusions:

The interpretation of cross-validation estimates needs careful consideration.
Nested cross-validation is a more robust approach for assessing prediction error and constructing confidence intervals.
Re-fitting models on combined data after splitting invalidates confidence intervals.