Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

11.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
11.9K
Survival Tree01:19

Survival Tree

80
Survival trees are a non-parametric method used in survival analysis to model the relationship between a set of covariates and the time until an event of interest occurs, often referred to as the "time-to-event" or "survival time." This method is particularly useful when dealing with censored data, where the event has not occurred for some individuals by the end of the study period, or when the exact time of the event is unknown.
 Building a Survival Tree
Constructing a...
80
Sampling Plans01:23

Sampling Plans

181
Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...
181
Evolutionary Relationships through Genome Comparisons02:54

Evolutionary Relationships through Genome Comparisons

5.7K
Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse...
5.7K
Hypothesis Test for Test of Independence01:16

Hypothesis Test for Test of Independence

3.6K
The test of independence is a chi-square-based test used to determine whether two variables or factors are independent or dependent. This hypothesis test is used to examine the independence of the variables. One can construct two qualitative survey questions or experiments based on the variables in a contingency table. The goal is to see if the two variables are unrelated (independent) or related (dependent). The null and alternative hypotheses for this test are:
H0: The two variables (factors)...
3.6K
Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data01:16

Statistical Inference Techniques in Hypothesis Testing: Parametric Versus Nonparametric Data

124
Statistical inference techniques, paramount in hypothesis testing, differentiate into two broad categories: parametric and nonparametric statistics.
Parametric statistics, as the name suggests, assumes that data follow a specific distribution, often a normal distribution. This assumption enables robust hypothesis testing and estimation. Parametric methods, like the Student's t-test or Goodness-of-fit test, are frequently employed in biostatistics due to their robustness. For instance,...
124

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Discussion of "Data fission: splitting a single data point".

Journal of the American Statistical Association·2025
Same author

Controlling the False Split Rate in Tree-Based Aggregation.

Journal of the American Statistical Association·2025
Same author

Inferring independent sets of Gaussian variables after thresholding correlations.

Journal of the American Statistical Association·2025
Same author

Generalized data thinning using sufficient statistics.

Journal of the American Statistical Association·2025
Same author

Testing for a difference in means of a single feature after clustering.

Biostatistics (Oxford, England)·2024
Same author

Tree-Values: Selective Inference for Regression Trees.

Journal of machine learning research : JMLR·2024
Same journal

Instrumental Variable Estimation of Marginal Structural Mean Models for Time-Varying Treatment.

Journal of the American Statistical Association·2026
Same journal

Semiparametric Joint Modeling for Survival Analysis with Longitudinal Covariates.

Journal of the American Statistical Association·2026
Same journal

Dimension Reduction for Large-Scale Federated Data: Statistical Rate and Asymptotic Inference.

Journal of the American Statistical Association·2026
Same journal

Facilitating Heterogeneous Effect Estimation via Statistically Efficient Categorical Modifiers.

Journal of the American Statistical Association·2026
Same journal

Nonparametric Density Estimation of a Long-Term Trend from Repeated Semicontinuous Data.

Journal of the American Statistical Association·2026
Same journal

Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Clinicogenomic Data.

Journal of the American Statistical Association·2026
See all related articles

Related Experiment Video

Updated: Jun 27, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.0K

Selective Inference for Hierarchical Clustering.

Lucy L Gao1, Jacob Bien2, Daniela Witten3

  • 1Department of Statistics, University of British Columbia.

Journal of the American Statistical Association
|April 25, 2024
PubMed
Summary
This summary is machine-generated.

When groups are identified by clustering, traditional statistical tests inflate the type I error rate. This study introduces a selective inference method to accurately test for mean differences between clusters, controlling for data-driven hypothesis selection.

Keywords:
difference in meanshypothesis testingpost-selection inferencetype I error

More Related Videos

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.1K

Related Experiment Videos

Last Updated: Jun 27, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.0K
A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment
12:18

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

7.5K
Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations
08:03

Heuristic Mining of Hierarchical Genotypes and Accessory Genome Loci in Bacterial Populations

Published on: December 7, 2021

2.1K

Area of Science:

  • Statistical methodology
  • Bioinformatics
  • Data science

Background:

  • Classical statistical tests assume groups are defined a priori.
  • Clustering-based group definitions lead to inflated Type I error rates with traditional tests.
  • This issue persists even with independent datasets for clustering and testing.

Purpose of the Study:

  • To develop a selective inference approach for testing mean differences between two clusters.
  • To control the Type I error rate when hypotheses are data-driven.
  • To provide an efficient method for computing exact p-values for hierarchical clustering.

Main Methods:

  • Proposed a selective inference procedure to address inflated Type I errors.
  • Developed efficient computation of exact p-values for agglomerative hierarchical clustering.
  • Validated the method using simulated and single-cell RNA-sequencing data.

Main Results:

  • The proposed selective inference method effectively controls the Type I error rate.
  • Demonstrated accurate p-value computation for data-driven cluster comparisons.
  • Successfully applied the method to real-world single-cell RNA-sequencing data.

Conclusions:

  • Selective inference is crucial for valid hypothesis testing after data-driven clustering.
  • The developed method offers a statistically sound approach for comparing means between clusters.
  • This work has implications for fields utilizing clustering, such as single-cell genomics.