Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Central Limit Theorem01:14

Central Limit Theorem

13.8K
The central limit theorem, abbreviated as clt, is one of the most powerful and useful ideas in all of statistics. The central limit theorem for sample means says that if you repeatedly draw samples of a given size and calculate their means, and create a histogram of those means, then the resulting histogram will tend to have an approximate normal bell shape. In other words, as sample sizes increase, the distribution of means follows the normal distribution more closely.
The sample size, n, that...
13.8K
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.6K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.6K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.3K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.3K
Choosing Between z and t Distribution01:25

Choosing Between z and t Distribution

2.7K
The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...
2.7K
Trimmed Mean01:10

Trimmed Mean

2.8K
While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...
2.8K
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

95
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
95

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Corrigendum to: Autoreactive T cell receptors with shared germline-like α chains in type 1 diabetes.

JCI insight·2026
Same author

Small Area Estimation of Education Levels in Low- and Middle-Income Countries.

The annals of applied statistics·2026
Same author

Jointly Estimating Subnational Mortality for Multiple Populations.

Demographic research·2026
Same author

Identification of a type 1 diabetes-associated T cell receptor repertoire signature from the human peripheral blood.

Science advances·2026
Same author

Age alters integrated cerebrovascular and cardiovascular dynamic responses to exercise: insights from a systems modeling approach.

Journal of applied physiology (Bethesda, Md. : 1985)·2025
Same author

Discussion of "Data fission: splitting a single data point".

Journal of the American Statistical Association·2025
Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Video

Updated: May 9, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K

Generalized data thinning using sufficient statistics.

Ameer Dharamshi1, Anna Neufeld2, Keshav Motwani1

  • 1Department of Biostatistics, University of Washington.

Journal of the American Statistical Association
|April 30, 2025
PubMed
Summary
This summary is machine-generated.

This study introduces a generalized data thinning strategy to decompose random variables into independent ones. This method expands applicability and unifies thinning with sample splitting through sufficiency.

Keywords:
cross-validationexponential familiesmodel validationsample splittingselective inference

More Related Videos

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography
03:35

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography

Published on: December 1, 2023

237
Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

6.9K

Related Experiment Videos

Last Updated: May 9, 2025

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.4K
Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography
03:35

Determining Gender-Based Differences in Retinal and Choroidal Thickness in Underweight Individuals via Swept-Source Optical Coherence Tomography

Published on: December 1, 2023

237
Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

6.9K

Area of Science:

  • Statistics
  • Probability Theory
  • Statistical Inference

Background:

  • Traditional methods for decomposing random variables can fail in certain inference tasks.
  • Prior work demonstrated data thinning for specific natural exponential families, requiring a summation constraint.

Purpose of the Study:

  • To develop a general strategy for decomposing a random variable into independent random variables.
  • To relax the summation requirement of previous thinning methods.
  • To unify data thinning and sample splitting under the principle of sufficiency.

Main Methods:

  • Generalizing the procedure of thinning random variables.
  • Relaxing the summation constraint to a functional reconstruction.
  • Applying generalized thinning to diverse statistical families.

Main Results:

  • Expanded the range of distributions amenable to thinning.
  • Demonstrated that data thinning and sample splitting are unified applications of sufficiency.
  • Developed a general strategy applicable to a wider array of statistical families.

Conclusions:

  • The generalized thinning procedure offers a more flexible approach to random variable decomposition.
  • Sufficiency is identified as the unifying principle behind data thinning and sample splitting.
  • The method enhances capabilities for model validation and inference.