Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

How many clusters? An information-theoretic perspective.

Susanne Still1, William Bialek

  • 1Department of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA. susanne@princeton.edu

Neural Computation
|November 2, 2004
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Searching for sequence features that control DNA cyclizability.

PNAS nexus·2026
Same author

The FlEye camera: Sampling the joint distribution of natural scenes and motion.

Physical review. E·2026
Same author

Maximum entropy models for patterns of gene expression.

Physical review. E·2025
Same author

Exact minimax entropy models of large-scale neuronal activity.

Physical review. E·2025
Same author

Optimization and variability can coexist.

ArXiv·2025
Same author

Deriving a genetic regulatory network from an optimization principle.

Proceedings of the National Academy of Sciences of the United States of America·2025
Same journal

A Model-Free Reinforcement Learning Implementation of Decision Making Under Uncertainty by Sequential Sampling.

Neural computation·2026
Same journal

DROP: Distributional and Regular Optimism and Pessimism for Reinforcement Learning.

Neural computation·2026
Same journal

Hierarchical Active Inference Using Successor Representations.

Neural computation·2026
Same journal

W-Kernel and Its Principal Space for Frequentist Evaluation of Bayesian Estimators.

Neural computation·2026
Same journal

A Hidden Markov Model-Inspired Sequence Classification Method for Hyperdimensional Computing.

Neural computation·2026
Same journal

Sparse Graphical Modeling for Electrophysiological Phase-Based Connectivity Using Circular Statistics.

Neural computation·2026
See all related articles

This study introduces a novel method for determining the optimal number of clusters in data analysis. By correcting for sampling errors, it identifies the maximum meaningful structure without external validation.

Area of Science:

  • Data Science
  • Statistical Mechanics
  • Information Theory

Background:

  • Clustering is crucial for uncovering patterns in large datasets across various scientific fields.
  • Determining the optimal number of clusters is a persistent challenge in data analysis.
  • Existing methods often rely on assumed cluster shapes or separate criteria for assignment and validation.

Purpose of the Study:

  • To develop a data-driven approach for identifying the optimal number of clusters.
  • To address the limitations of traditional clustering methods in handling finite datasets.
  • To find a clustering solution that captures maximal meaningful structure by correcting for sampling bias.

Main Methods:

  • Utilizing a statistical mechanics framework where clustering is viewed as a balance between energy and entropy.

Related Experiment Videos

  • Applying an information-theoretic approach to account for the finite size of datasets.
  • Introducing a method to correct clustering criteria for sampling error bias.
  • Main Results:

    • Demonstrated that dataset size inherently determines an optimal temperature for clustering.
    • Developed a method to find the maximal number of resolvable clusters in the hard clustering limit.
    • Showcased how correcting for sampling bias leads to optimal clustering solutions.

    Conclusions:

    • The proposed method allows for the determination of the optimal number of clusters without external criteria.
    • This approach effectively balances capturing meaningful data structure with avoiding sampling noise.
    • Offers a robust framework for cluster analysis in the presence of finite data limitations.