Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

12.0K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
12.0K
Test for Homogeneity01:23

Test for Homogeneity

2.0K
The goodness–of–fit test can be used to decide whether a population fits a given distribution, but it will not suffice to decide whether two populations follow the same unknown distribution. A different test, called the test for homogeneity, can be used to conclude whether two populations have the same distribution. To calculate the test statistic for a test for homogeneity, follow the same procedure as with the test of independence. The hypotheses for the test for homogeneity can...
2.0K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

1.7K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
1.7K
Hypothesis Test for Test of Independence01:16

Hypothesis Test for Test of Independence

3.7K
The test of independence is a chi-square-based test used to determine whether two variables or factors are independent or dependent. This hypothesis test is used to examine the independence of the variables. One can construct two qualitative survey questions or experiments based on the variables in a contingency table. The goal is to see if the two variables are unrelated (independent) or related (dependent). The null and alternative hypotheses for this test are:
H0: The two variables (factors)...
3.7K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

2.6K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
2.6K
Kruskal-Wallis Test01:19

Kruskal-Wallis Test

854
The Kruskal-Wallis test, also known as the Kruskal-Wallis H test, serves as a nonparametric alternative to the one-way ANOVA, offering a solution for analyzing the differences across three or more independent groups based on a single, ordinal-dependent variable. This statistical test is particularly valuable in scenarios where the data does not meet the normal distribution assumption required by its parametric counterparts. Kruskal-Wallis test is designed typically to handle ordinal data or...
854

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Dual regulation of metabolic reprogramming and protein competitive modifications: the core role and mechanism of MRPL36 in glioblastoma malignant progression.

Journal of experimental & clinical cancer research : CR·2026
Same author

Predicting immune-related thyroiditis using polygenic risk scores in patients with advanced melanoma.

Journal for immunotherapy of cancer·2026
Same author

Melatonin Rescues Enamel Defects Induced by Maternal Circadian Disruption via Targeting the BMAL1-JNK3 Axis.

International dental journal·2026
Same author

Quantum-Enhanced Sensing Enabled by Scrambling-Induced Genuine Multipartite Entanglement.

Physical review letters·2026
Same author

SREBP2 promotes macrophage alternative activation and allergic airway inflammation independent of cholesterol biosynthesis.

Cell death & disease·2026
Same author

Abundance and balance of circulating leukocyte subsets and colorectal cancer survival.

British journal of cancer·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Aug 4, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.0K

Sparse clusterability: testing for cluster structure in high dimensions.

Jose Laborde1, Paul A Stewart2,3, Zhihua Chen2

  • 1Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL, USA. jose.laborde@moffitt.org.

BMC Bioinformatics
|April 2, 2023
PubMed
Summary
This summary is machine-generated.

This study introduces new clusterability testing methods for high-dimensional data using sparse principal component analysis. The methods show good performance on various datasets, assessing if data naturally forms distinct groups.

Keywords:
Big dataCluster analysisCluster tendencyClusteringDimension reductionDistance metricsMultimodality testingPrincipal component analysisSparsity

More Related Videos

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.5K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Related Experiment Videos

Last Updated: Aug 4, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.0K
ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data
05:12

ExCYT: A Graphical User Interface for Streamlining Analysis of High-Dimensional Cytometry Data

Published on: January 16, 2019

11.5K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K

Area of Science:

  • Statistics
  • Data Mining
  • Bioinformatics

Background:

  • Cluster analysis is widely used to group data, assuming underlying cluster structures.
  • Clusterability testing verifies this assumption, crucial for valid analysis.
  • High-dimensional data presents unique challenges for traditional clustering methods.

Purpose of the Study:

  • To develop and evaluate clusterability testing methods for high-dimensional data.
  • To assess the performance of these methods using simulated and real-world datasets.
  • To compare the proposed methods against existing techniques.

Main Methods:

  • Utilizing sparse principal component analysis for clusterability testing.
  • Evaluating Type I error and statistical power with simulated high-dimensional data.
  • Applying methods to gene expression, microarray, and proteomics datasets.

Main Results:

  • Proposed methods demonstrate reasonably low Type I error and maintained power across diverse datasets.
  • Effectiveness varied with data structure and dimensionality; some datasets with close clusters were not detectable.
  • This is the first comprehensive analysis of clusterability testing in high-dimensional settings.

Conclusions:

  • The developed methods offer a viable approach for clusterability testing in high-dimensional data.
  • The study highlights the importance of considering data structure and dimensionality.
  • This work provides a foundation for future research in high-dimensional cluster analysis.