Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Test for Homogeneity

Test for Homogeneity

The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Hypothesis Test for Test of Independence

Hypothesis Test for Test of Independence

The test of independence is a chi-square-based test used to determine whether two variables or factors are independent or dependent. This hypothesis test is used to examine the independence of the variables. One can construct two qualitative survey questions or experiments based on the variables in a contingency table. The goal is to see if the two variables are unrelated (independent) or related (dependent). The null and alternative hypotheses for this test are:
H0: The two variables (factors)...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Kruskal-Wallis Test

Kruskal-Wallis Test

The Kruskal-Wallis test, also known as the Kruskal-Wallis H test, serves as a nonparametric alternative to the one-way ANOVA, offering a solution for analyzing the differences across three or more independent groups based on a single, ordinal-dependent variable. This statistical test is particularly valuable in scenarios where the data does not meet the normal distribution assumption required by its parametric counterparts. Kruskal-Wallis test is designed typically to handle ordinal data or...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Dual regulation of metabolic reprogramming and protein competitive modifications: the core role and mechanism of MRPL36 in glioblastoma malignant progression.

Journal of experimental & clinical cancer research : CR·2026

Same author

Predicting immune-related thyroiditis using polygenic risk scores in patients with advanced melanoma.

Journal for immunotherapy of cancer·2026

Same author

Melatonin Rescues Enamel Defects Induced by Maternal Circadian Disruption via Targeting the BMAL1-JNK3 Axis.

International dental journal·2026

Same author

Quantum-Enhanced Sensing Enabled by Scrambling-Induced Genuine Multipartite Entanglement.

Physical review letters·2026

Same author

SREBP2 promotes macrophage alternative activation and allergic airway inflammation independent of cholesterol biosynthesis.

Cell death & disease·2026

Same author

Abundance and balance of circulating leukocyte subsets and colorectal cancer survival.

British journal of cancer·2026

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Aug 4, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Sparse clusterability: testing for cluster structure in high dimensions.

Jose Laborde¹, Paul A Stewart^2,3, Zhihua Chen²

¹Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA. jose.laborde@moffitt.org.

BMC Bioinformatics

|April 2, 2023

Summary

This summary is machine-generated.

This study introduces new clusterability testing methods for high-dimensional data using sparse principal component analysis. The methods show good performance on various datasets, assessing if data naturally forms distinct groups.

Keywords:

Big data Cluster analysis Cluster tendency Clustering Dimension reduction Distance metrics Multimodality testing Principal component analysis Sparsity

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Aug 4, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Statistics
Data Mining
Bioinformatics

Background:

Cluster analysis is widely used to group data, assuming underlying cluster structures.
Clusterability testing verifies this assumption, crucial for valid analysis.
High-dimensional data presents unique challenges for traditional clustering methods.

Purpose of the Study:

To develop and evaluate clusterability testing methods for high-dimensional data.
To assess the performance of these methods using simulated and real-world datasets.
To compare the proposed methods against existing techniques.

Main Methods:

Utilizing sparse principal component analysis for clusterability testing.
Evaluating Type I error and statistical power with simulated high-dimensional data.
Applying methods to gene expression, microarray, and proteomics datasets.

Main Results:

Proposed methods demonstrate reasonably low Type I error and maintained power across diverse datasets.
Effectiveness varied with data structure and dimensionality; some datasets with close clusters were not detectable.
This is the first comprehensive analysis of clusterability testing in high-dimensional settings.

Conclusions:

The developed methods offer a viable approach for clusterability testing in high-dimensional data.
The study highlights the importance of considering data structure and dimensionality.
This work provides a foundation for future research in high-dimensional cluster analysis.