Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Central Limit Theorem

Central Limit Theorem

The central limit theorem, abbreviated as clt, is one of the most powerful and useful ideas in all of statistics. The central limit theorem for sample means says that if you repeatedly draw samples of a given size and calculate their means, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. In other words, as sample sizes increase, the distribution of means follows the normal distribution more closely.
The sample size, n, that...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Trimmed Mean

Trimmed Mean

While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Corrigendum to: Autoreactive T cell receptors with shared germline-like α chains in type 1 diabetes.

JCI insight·2026

Same author

Small Area Estimation of Education Levels in Low- and Middle-Income Countries.

The annals of applied statistics·2026

Same author

Jointly Estimating Subnational Mortality for Multiple Populations.

Demographic research·2026

Same author

Identification of a type 1 diabetes-associated T cell receptor repertoire signature from the human peripheral blood.

Science advances·2026

Same author

Age alters integrated cerebrovascular and cardiovascular dynamic responses to exercise: insights from a systems modeling approach.

Journal of applied physiology (Bethesda, Md. : 1985)·2025

Same author

Discussion of "Data fission: splitting a single data point".

Journal of the American Statistical Association·2025

Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026

Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026

Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026

Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026

Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026

Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026

See all related articles

Search research articles

Related Experiment Video

Updated: May 9, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Generalized data thinning using sufficient statistics.

Ameer Dharamshi¹, Anna Neufeld², Keshav Motwani¹

¹Department of Biostatistics, University of Washington.

Journal of the American Statistical Association

|April 30, 2025

Summary

This summary is machine-generated.

This study introduces a generalized data thinning strategy to decompose random variables into independent ones. This method expands applicability and unifies thinning with sample splitting through sufficiency.

Keywords:

cross-validation exponential families model validation sample splitting selective inference

More Related Videos

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography

Published on: December 1, 2023

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Related Experiment Videos

Last Updated: May 9, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography

Published on: December 1, 2023

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Area of Science:

Statistics
Probability Theory
Statistical Inference

Background:

Traditional methods for decomposing random variables can fail in certain inference tasks.
Prior work demonstrated data thinning for specific natural exponential families, requiring a summation constraint.

Purpose of the Study:

To develop a general strategy for decomposing a random variable into independent random variables.
To relax the summation requirement of previous thinning methods.
To unify data thinning and sample splitting under the principle of sufficiency.

Main Methods:

Generalizing the procedure of thinning random variables.
Relaxing the summation constraint to a functional reconstruction.
Applying generalized thinning to diverse statistical families.

Main Results:

Expanded the range of distributions amenable to thinning.
Demonstrated that data thinning and sample splitting are unified applications of sufficiency.
Developed a general strategy applicable to a wider array of statistical families.

Conclusions:

The generalized thinning procedure offers a more flexible approach to random variable decomposition.
Sufficiency is identified as the unifying principle behind data thinning and sample splitting.
The method enhances capabilities for model validation and inference.