Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Poisson Probability Distribution

Poisson Probability Distribution

A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Binomial Probability Distribution

Binomial Probability Distribution

A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

Extraction: Partition and Distribution Coefficients

Extraction: Partition and Distribution Coefficients

The distribution law or Nernst's distribution law is the law that governs the distribution of a solute between two immiscible solvents. This law, also known as the partition law, states that if a solute is added to the mixture of two immiscible solvents at a constant temperature, the solute is distributed between the two solvents in such a way that the ratio of solute concentrations in the solvents remains constant at equilibrium.
For extracting a solute from an aqueous phase into an...

Probability Histograms

Probability Histograms

A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Neural topic modeling on hyperspheres: Spherical representation learning with von Mises-Fisher mixtures.

Neural networks : the official journal of the International Neural Network Society·2026

Same author

HKANLP: Link Prediction With Hyperspherical Embeddings and Kolmogorov-Arnold Networks.

IEEE transactions on neural networks and learning systems·2025

Same author

Disentangled representation learning for multi-view clustering via von Mises-Fisher hyperspherical embedding.

Neural networks : the official journal of the International Neural Network Society·2025

Same author

Clustering and Interpretability of Residential Electricity Demand Profiles.

Sensors (Basel, Switzerland)·2025

Same author

Correlated Topic Modeling for Short Texts in Spherical Embedding Spaces.

IEEE transactions on pattern analysis and machine intelligence·2025

Same author

SAVE: Self-Attention on Visual Embedding for Zero-Shot Generic Object Counting.

Journal of imaging·2025

Same journal

Hidden Data Recovery and Forecasting via Next-Generation Reservoir Computing With Multiscale Delay Selection.

IEEE transactions on neural networks and learning systems·2026

Same journal

CAFF-CIL: Causality-Aware Freedom Forgetting Approach for Class-Incremental Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Harmonic Autoencoding Framework for Multiple Tasks in Magnetic Particle Imaging Reconstruction.

IEEE transactions on neural networks and learning systems·2026

Same journal

A Survey on Human-Centric Voice-Face Multimodal Learning.

IEEE transactions on neural networks and learning systems·2026

Same journal

Vision-Assisted Foundation Model for Solving Multitask Vehicle Routing Problems.

IEEE transactions on neural networks and learning systems·2026

Same journal

FP3O: Enabling Proximal Policy Optimization in Multiagent Cooperation With Parameter-Sharing Versatility.

IEEE transactions on neural networks and learning systems·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Dec 5, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Sparse Count Data Clustering Using an Exponential Approximation to Generalized Dirichlet Multinomial Distributions.

Nuha Zamzami, Nizar Bouguila

IEEE Transactions on Neural Networks and Learning Systems

|October 20, 2020

Summary

This summary is machine-generated.

This study introduces an efficient exponential-family approximation to Generalized Dirichlet multinomial (GDM) distributions for clustering high-dimensional count data. The new model, EGDM, significantly speeds up parameter estimation and improves clustering performance across various data types.

More Related Videos

Quantifying Spatiotemporal Parameters of Cellular Exocytosis in Micropatterned Cells

Quantifying Spatiotemporal Parameters of Cellular Exocytosis in Micropatterned Cells

Published on: September 16, 2020

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Related Experiment Videos

Last Updated: Dec 5, 2025

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Quantifying Spatiotemporal Parameters of Cellular Exocytosis in Micropatterned Cells

Quantifying Spatiotemporal Parameters of Cellular Exocytosis in Micropatterned Cells

Published on: September 16, 2020

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

Area of Science:

Computational statistics
Machine learning
Data mining

Background:

Clustering high-dimensional, sparse count data is computationally challenging.
Generalized Dirichlet multinomial (GDM) distributions offer accuracy but suffer from slow parameter estimation.
Exponential-family approximations provide efficient training without dimensionality reduction.

Purpose of the Study:

To develop an efficient exponential-family approximation to GDM distributions, termed EGDM.
To create a novel clustering algorithm for count data using an EGDM mixture model.
To introduce a method for determining the optimal number of EGDM components using the Minimum Message Length (MML) criterion.

Main Methods:

Derivation of an exponential-family approximation to GDM distributions (EGDM).
Development of a mixture model based on EGDM.
Parameter learning using the deterministic annealing expectation-maximization (DAEM) approach.
Optimal component selection via the Minimum Message Length (MML) criterion.

Main Results:

The proposed EGDM mixture model demonstrates superior clustering performance on text, image, and video data.
The EGDM approach significantly reduces computation time compared to standard GDM methods.
Empirical experiments validate the effectiveness and efficiency of the EGDM clustering algorithm.

Conclusions:

The EGDM model offers an efficient and accurate solution for clustering high-dimensional count data.
The DAEM algorithm effectively learns parameters for the EGDM mixture model.
The MML criterion provides a reliable method for selecting the optimal number of clusters.