Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Classifying Matter by Composition

Classifying Matter by Composition

Matter: Pure Substances and Mixtures
According to its composition, the matter can be classified into two broad categories — pure substances and mixtures.
A pure substance is a form of matter that has a constant composition throughout with uniform properties. For example, any sample of sucrose has the same composition and same physical properties, such as melting point, color, and sweetness, regardless of the source from which it is isolated.
A mixture is composed of two or more types of...

Mass Spectrometry: Complex Analysis

Mass Spectrometry: Complex Analysis

Mass spectrometry is an important technique for the identification of pure compounds. However, it has some limitations for the analysis of complex mixtures, often due to excessive fragmentation making the spectrum too complicated to decipher. Mass spectrometry can be combined with suitable separation methods in sequence, forming hyphenated methods, which are useful in the analysis of complex mixtures.
GC–MS is a powerful hyphenated method commonly used in forensics and environmental...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Combining Functions

Combining Functions

Functions can be combined to form new mathematical models that describe interactions between variables. These combinations are fundamental in understanding relationships between changing quantities and are commonly encountered in scientific and engineering contexts. The combination methods—addition, subtraction, multiplication, division, and composition—each have unique implications for the resulting function’s domain and behavior.When combining functions through arithmetic operations, such...

Tandem Mass Spectrometry

Tandem Mass Spectrometry

Tandem mass spectrometry is a technique that uses multiple mass analyzers in series to obtain a higher selectivity and reduce chemical noise during analyte detection. Instruments with multiple analyzers separated by an interaction cell enable secondary fragmentation and selected study of the fragment ions.Secondary fragmentations occur in the interaction cell and can be induced by various factors. Fragmentation induced by collision with inert gases, such as N2, Ar, He, etc., is called...

Ideal Solutions or Mixtures

Ideal Solutions or Mixtures

From a molecular perspective, an ideal solution is one in which the intermolecular interactions between unlike molecules are, on average, the same as those between like molecules. This is the case for ideal gas mixtures, where the molecules are far apart and do not interact with each other. However, for condensed phases like liquids or solids, the molecules are close together and interact with each other. In an ideal solution, the molecules of different species are so similar to each other that...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Comparing variable selection and model averaging methods for logistic regression.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Resolving sensitivity, specificity and signal contamination in Xenium spatial transcriptomics.

Nature methods·2026

Same author

Bringing Age Back In: Accounting for Population Age Distribution in Forecasting Migration.

Demography·2026

Same author

Multiplexed single-cell and spatial profiling reveal B cells and tertiary lymphoid structures as prognostic indicators in pleural mesothelioma.

British journal of cancer·2026

Same author

Humoral and cellular responses to a tetravalent dengue vaccine (TAK-003) in adults from a dengue non-endemic region: An open-label phase 2 trial.

Vaccine·2026

Same author

Molecular profiling of inflammatory palmoplantar disorders for diagnosis and treatment optimization.

The Journal of allergy and clinical immunology·2026

Same journal

Probabilistic Joint and Individual Variation Explained (ProJIVE) for Data Integration.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

fastkqr: A Fast Algorithm for Kernel Quantile Regression.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Empirical Bayes Covariance Decomposition, and a Solution to the Multiple Tuning Problem in Sparse PCA.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Joint Registration and Conformal Prediction for Partially Observed Functional Data.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Efficient Decision Trees for Tensor Regressions.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

Same journal

Distributed Nonparametric Regression with Heterogeneity Through Prediction-Based Aggregation.

Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 8, 2026

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Published on: April 8, 2020

Combining Mixture Components for Clustering.

Jean-Patrick Baudry¹, Adrian E Raftery, Gilles Celeux

¹Université Paris-Sud XI.

Journal of Computational and Graphical Statistics : a Joint Publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America

|October 19, 2010

Summary

This summary is machine-generated.

This study introduces a novel hierarchical clustering method to accurately determine the number of clusters in data. It prevents overestimation by combining Gaussian mixture components using an entropy criterion, improving model-based clustering analysis.

More Related Videos

Spatial Separation of Molecular Conformers and Clusters

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone (ITZ)

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone (ITZ)

Published on: December 16, 2019

Related Experiment Videos

Last Updated: Jun 8, 2026

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Computation of Atmospheric Concentrations of Molecular Clusters from ab initio Thermochemistry

Published on: April 8, 2020

Spatial Separation of Molecular Conformers and Clusters

Spatial Separation of Molecular Conformers and Clusters

Published on: January 9, 2014

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone (ITZ)

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone (ITZ)

Published on: December 16, 2019

Area of Science:

Computational statistics
Data mining
Machine learning

Background:

Model-based clustering typically uses multivariate normal distributions and Bayesian Information Criterion (BIC) to determine the number of clusters.
Overestimation of clusters can occur when non-Gaussian clusters are represented by multiple Gaussian distributions, leading to inaccurate data partitioning.

Purpose of the Study:

To propose a new method for accurate cluster number determination in model-based clustering.
To address the issue of overestimating cluster numbers when dealing with non-Gaussian data distributions.

Main Methods:

A two-stage approach is proposed: first, determine the total number of Gaussian mixture components (K) using BIC.
Second, hierarchically combine these components based on an entropy criterion to yield soft clusterings.
An automatic method for selecting the final number of clusters is described using piecewise linear regression on a rescaled entropy plot.

Main Results:

The proposed method provides a unique soft clustering for each number of clusters up to K.
Demonstrated effectiveness with simulated data and a real-world flow cytometry dataset.
The approach successfully mitigates the overestimation of clusters inherent in standard BIC-based methods for non-Gaussian data.

Conclusions:

The hierarchical clustering approach offers a more accurate way to determine the number of clusters compared to traditional methods.
This method enhances the reliability of model-based clustering, particularly for datasets with complex cluster structures.
The technique provides a flexible framework for cluster analysis, allowing for comparison across different numbers of clusters.