Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

11.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.9K
Sampling Plans01:23

Sampling Plans

181
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
181
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.6K
One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation01:24

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

507
This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...
507
One-Way ANOVA: Unequal Sample Sizes01:15

One-Way ANOVA: Unequal Sample Sizes

5.8K
One-way ANOVA can be performed on three or more samples of unequal sizes. However, calculations get complicated when sample sizes are not always the same. So, while performing ANOVA with unequal samples size, the following equation is used:
5.8K
Hypothesis Test for Test of Independence01:16

Hypothesis Test for Test of Independence

3.6K
The test of independence is a chi-square-based test used to determine whether two variables or factors are independent or dependent. This hypothesis test is used to examine the independence of the variables. One can construct two qualitative survey questions or experiments based on the variables in a contingency table. The goal is to see if the two variables are unrelated (independent) or related (dependent). The null and alternative hypotheses for this test are:
H0: The two variables (factors)...
3.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Large language models for full-text methods assessment: a case study on mediation analysis.

Journal of the American Medical Informatics Association : JAMIA·2026
Same author

Prefrontal to ventral tegmental area dynamics drive contingency degradation.

Nature·2026
Same author

Relation Between Executive Function Test Performance and Treatment Outcomes During Brief Psychotherapies for Later-Life Depression.

The American journal of geriatric psychiatry. Open science, education, and practice·2025
Same author

Long-read transcriptome analysis using IsoRanker for identifying pathogenic variants in Mendelian conditions.

medRxiv : the preprint server for health sciences·2025
Same author

A haplotype-resolved view of human gene regulation.

bioRxiv : the preprint server for biology·2025
Same author

Microbes with higher metabolic independence are enriched in human gut microbiomes under stress.

eLife·2025
Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026
Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026
Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026
Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026
Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026
See all related articles

Related Experiment Video

Updated: Jul 5, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.0K

Selective inference for -means clustering.

Yiqun T Chen1, Daniela M Witten2

  • 1Data Science Institute and Department of Biomedical Data Science, Stanford University, Stanford, CA 94305, USA.

Journal of Machine Learning Research : JMLR
|January 24, 2024
PubMed
Summary
This summary is machine-generated.

This study introduces a new p-value for k-means clustering to accurately test for mean differences between clusters. The method controls Type I errors, improving statistical reliability in data analysis.

Keywords:
Hypothesis testingPost-selection inferenceRNA-sequencingType I errorUnsupervised learning

More Related Videos

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ
08:59

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

8.1K
ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.4K

Related Experiment Videos

Last Updated: Jul 5, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.0K
Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ
08:59

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

8.1K
ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.4K

Area of Science:

  • Statistics
  • Machine Learning
  • Data Mining

Background:

  • Classical hypothesis tests are unreliable for k-means clustering due to inflated Type I error rates.
  • Existing methods, like those for hierarchical clustering, are not applicable to k-means.
  • Accurate statistical inference is crucial for interpreting cluster analysis results.

Purpose of the Study:

  • To develop a statistically sound method for testing differences in means between clusters identified by k-means.
  • To address the limitations of classical hypothesis tests in the context of k-means clustering.
  • To provide a computationally efficient p-value that controls Type I error.

Main Methods:

  • Proposed a novel p-value that conditions on all intermediate assignments within the k-means algorithm.
  • Demonstrated theoretical control of the selective Type I error rate.
  • Developed an efficient computation method for the proposed p-value.

Main Results:

  • The proposed p-value effectively controls the selective Type I error in finite samples.
  • The method is applicable to cluster means comparison after k-means.
  • The p-value can be computed efficiently.

Conclusions:

  • The developed p-value offers a reliable solution for hypothesis testing after k-means clustering.
  • This method enhances the statistical validity of findings from k-means analyses.
  • The approach was successfully applied to real-world datasets, including handwritten digits and single-cell RNA-sequencing data.