Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

12.9K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
12.9K
Chi-square Distribution01:10

Chi-square Distribution

4.8K
How does one determine if bingo numbers are evenly distributed or if some numbers occurred with a greater frequency? Or if the types of movies people preferred were different across different age groups or if a coffee machine dispensed approximately the same amount of coffee each time. These questions can be addressed by conducting a hypothesis test. One distribution that can be used to find answers to such questions is known as the chi-square distribution. The chi-square distribution has...
4.8K
Probability Histograms01:17

Probability Histograms

12.2K
A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.
12.2K
Binomial Probability Distribution01:15

Binomial Probability Distribution

11.4K
A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...
11.4K
Relative Frequency Histogram01:14

Relative Frequency Histogram

5.7K
The relative frequency depicts the proportion of data points that have each value. The frequency tells the number of data points that have each value. Like the histogram, a relative frequency histogram also has the same shape with a horizontal scale (the x-axis), but the vertical scale (the y-axis) is marked with relative frequencies (percentages of the whole) instead of actual frequencies. A relative frequency histogram is a graphical representation of a frequency distribution where the...
5.7K
Probability Distributions01:32

Probability Distributions

8.0K
 The probability of a random variable x  is the likelihood of its occurrence. A probability distribution represents the probabilities of a random variable using a formula, graph, or table. There are two types of probability distribution– discrete probability distribution and continuous probability distribution.
A discrete probability distribution is a probability distribution of discrete random variables. It can be categorized into binomial probability distribution and Poisson...
8.0K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Causal Effect Estimation With TMLE: Handling Missing Data and Near Violations of Positivity.

Biometrical journal. Biometrische Zeitschrift·2026
Same author

In Vitro Study to Evaluate the Antibacterial Effect of an Oxidising Agent on Ex Vivo Biofilm.

Oral health & preventive dentistry·2026
Same author

Local and global mortality experience: A novel hierarchical model for regional mortality risk.

PloS one·2026
Same author

Glucose-6-Phosphatase-Dehydrogenase activity as modulative association between Parkinson's disease and periodontitis.

Frontiers in cellular and infection microbiology·2024
Same author

The Impact of Implant Abutment Angle and Height on Peri-implant Tissue Health: Retrospective Analyses from a Randomized Controlled Clinical Trial.

The International journal of prosthodontics·2024
Same author

Association between Average Vitamin D Levels and COVID-19 Mortality in 19 European Countries-A Population-Based Study.

Nutrients·2023
Same journal

Invaders taking over-Mollusc faunal change in volcanic barrier lakes of the Albertine Rift biodiversity hotspot.

PloS one·2026
Same journal

AI-driven molecular diversification and ligand-based optimization of macitentan derivatives targeting VEGFR1 and endothelin signaling pathways.

PloS one·2026
Same journal

Performance patterns and records in the world aquatics masters championships: Where do the most frequently represented nations among the top-ten masters swimmers come from?

PloS one·2026
Same journal

Modeling diurnal Temperature-Rainfall relationships under multicollinearity using PLS-SEM: A case study of Ghana.

PloS one·2026
Same journal

Organizational culture, social capital, and emergency capacity in primary healthcare institutions: A cross-sectional structural equation modeling study comparing ordinary and older communities.

PloS one·2026
Same journal

Impact of kidney function on the metabolome in the general population.

PloS one·2026
See all related articles

Related Experiment Video

Updated: Sep 22, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K

Clustering compositional data using Dirichlet mixture model.

Samyajoy Pal1, Christian Heumann1

  • 1Department of Statistics, LMU Munich, Munich, Bayern, Germany.

Plos One
|May 18, 2022
PubMed
Summary
This summary is machine-generated.

This study introduces a novel Dirichlet mixture model for compositional data analysis, avoiding data transformations. The method effectively clusters complex datasets, outperforming existing techniques in simulations and real-world applications.

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K
Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ
08:59

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

8.3K

Related Experiment Videos

Last Updated: Sep 22, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.1K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

2.6K
Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ
08:59

Determination of Aggregate Surface Morphology at the Interfacial Transition Zone ITZ

Published on: December 16, 2019

8.3K

Area of Science:

  • Statistics
  • Data Mining
  • Machine Learning

Background:

  • Compositional data analysis (CoDa) often requires data transformations for standard clustering methods.
  • Existing clustering algorithms may struggle with the unique constraints of compositional data, such as the unit sum property.

Purpose of the Study:

  • To propose and evaluate a model-based clustering method specifically designed for compositional data.
  • To address the limitations of existing methods by directly handling the unit sum constraint without transformations.

Main Methods:

  • Development of a mixture model utilizing the Dirichlet distribution to accommodate compositional data.
  • Implementation of a modified hard Expectation-Maximization (EM) algorithm to prevent empty clusters and ensure convergence.
  • Rigorous simulation studies across various dimensions, cluster numbers, and overlap levels.

Main Results:

  • The proposed Dirichlet mixture model demonstrated robust performance in clustering simulated compositional data.
  • Comparative analysis showed the new method outperforming popular algorithms like KMeans, Gaussian Mixture Models (GMM), and Partition Around Medoids (PAM).
  • Successful application to real-world datasets from business/marketing and physical sciences, highlighting its practical utility.

Conclusions:

  • The Dirichlet mixture model offers a powerful and effective approach for clustering compositional data.
  • The modified hard EM algorithm successfully overcomes convergence issues, making the method reliable.
  • This approach provides a valuable alternative for analyzing compositional data without prior transformations.