Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

One-Way ANOVA: Unequal Sample Sizes

One-Way ANOVA: Unequal Sample Sizes

One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:

Hypothesis Test for Test of Independence

Hypothesis Test for Test of Independence

The test of independence is a chi-square-based test used to determine whether two variables or factors are independent or dependent. This hypothesis test is used to examine the independence of the variables. One can construct two qualitative survey questions or experiments based on the variables in a contingency table. The goal is to see if the two variables are unrelated (independent) or related (dependent). The null and alternative hypotheses for this test are:
H0: The two variables (factors)...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Large language models for full-text methods assessment: a case study on mediation analysis.

Journal of the American Medical Informatics Association : JAMIA·2026

Same author

Prefrontal to ventral tegmental area dynamics drive contingency degradation.

Nature·2026

Same author

Relation Between Executive Function Test Performance and Treatment Outcomes During Brief Psychotherapies for Later-Life Depression.

The American journal of geriatric psychiatry. Open science, education, and practice·2025

Same author

Long-read transcriptome analysis using IsoRanker for identifying pathogenic variants in Mendelian conditions.

medRxiv : the preprint server for health sciences·2025

Same author

A haplotype-resolved view of human gene regulation.

bioRxiv : the preprint server for biology·2025

Same author

Microbes with higher metabolic independence are enriched in human gut microbiomes under stress.

eLife·2025

Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026

Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026

Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026

Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026

Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026

Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jul 5, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Selective inference for -means clustering.

Yiqun T Chen¹, Daniela M Witten²

¹Data Science Institute and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.

Journal of Machine Learning Research : JMLR

|January 24, 2024

Summary

This summary is machine-generated.

This study introduces a new p-value for k-means clustering to accurately test for mean differences between clusters. The method controls Type I errors, improving statistical reliability in data analysis.

Keywords:

Hypothesis testing Post-selection inference RNA-sequencing Type I error Unsupervised learning

More Related Videos

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Related Experiment Videos

Last Updated: Jul 5, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

Area of Science:

Statistics
Machine Learning
Data Mining

Background:

Classical hypothesis tests are unreliable for k-means clustering due to inflated Type I error rates.
Existing methods, like those for hierarchical clustering, are not applicable to k-means.
Accurate statistical inference is crucial for interpreting cluster analysis results.

Purpose of the Study:

To develop a statistically sound method for testing differences in means between clusters identified by k-means.
To address the limitations of classical hypothesis tests in the context of k-means clustering.
To provide a computationally efficient p-value that controls Type I error.

Main Methods:

Proposed a novel p-value that conditions on all intermediate assignments within the k-means algorithm.
Demonstrated theoretical control of the selective Type I error rate.
Developed an efficient computation method for the proposed p-value.

Main Results:

The proposed p-value effectively controls the selective Type I error in finite samples.
The method is applicable to cluster means comparison after k-means.
The p-value can be computed efficiently.

Conclusions:

The developed p-value offers a reliable solution for hypothesis testing after k-means clustering.
This method enhances the statistical validity of findings from k-means analyses.
The approach was successfully applied to real-world datasets, including handwritten digits and single-cell RNA-sequencing data.