Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Cluster Sampling Method01:20

Cluster Sampling Method

15.3K
Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...
15.3K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

8.8K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
8.8K
Binomial Probability Distribution01:15

Binomial Probability Distribution

16.2K
A binomial distribution is a probability distribution for a procedure with a fixed number of trials, where each trial can have only two outcomes.
The outcomes of a binomial experiment fit a binomial probability distribution. A statistical experiment can be classified as a binomial experiment if the following conditions are met:
There are a fixed number of trials. Think of trials as repetitions of an experiment. The letter n denotes the number of trials.
There are only two possible outcomes,...
16.2K
Probability Histograms01:17

Probability Histograms

13.4K
A probability histogram is a visual representation of a probability distribution. Similar a typical histogram, the probability histogram consists of contiguous (adjoining) boxes. It has both a horizontal axis and a vertical axis. The horizontal axis is labeled with what the data represents. The vertical axis is labeled with probability. Each rectangular bar in the histogram is 1 unit wide, which suggests that the area under each bar equals the probability, P(x), where x is 1, 2, 3, and so on.
13.4K
Poisson Probability Distribution01:09

Poisson Probability Distribution

12.2K
A Poisson probability distribution is a discrete probability distribution. It gives the probability of a number of events occurring in a fixed interval of time or space if these events happen at a known average rate and independently of the time since the last event. For example, a book editor might be interested in the number of words spelled incorrectly in a particular book. It might be that, on average, there are five words spelled incorrectly in 100 pages. The interval is 100 pages.
The...
12.2K
Distributions to Estimate Population Parameter01:26

Distributions to Estimate Population Parameter

5.3K
The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...
5.3K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Predicting STAS in peripheral stage I lung adenocarcinoma: the incremental value of CT-based tumor disappearance rate, with a focus on part-solid nodules.

BMC medical imaging·2026
Same author

RxMap: an LLM-assisted tool for medication normalization.

JAMIA open·2026
Same author

Monocyte epigenetic age acceleration is linked to non-somatic depressive symptoms in women with and without HIV.

The journals of gerontology. Series A, Biological sciences and medical sciences·2026
Same author

MoMA: Model-based Mirror Ascent for Offline Reinforcement Learning.

Transactions on machine learning research·2025
Same author

Directed Cyclic Graphs for Simultaneous Discovery of Time-Lagged and Instantaneous Causality from Longitudinal Data Using Instrumental Variables.

Journal of machine learning research : JMLR·2025
Same author

Modeling Alzheimer's Disease Biomarkers' Trajectory in the Absence of a Gold Standard Using a Bayesian Approach.

Statistics in medicine·2025
Same journal

Classification Under Local Differential Privacy with Model Reversal and Model Averaging.

Journal of machine learning research : JMLR·2026
Same journal

Sparse Semiparametric Discriminant Analysis for High-dimensional Zero-inflated Data.

Journal of machine learning research : JMLR·2026
Same journal

Heterogeneity-aware Clustered Distributed Learning for Multi-source Data Analysis.

Journal of machine learning research : JMLR·2026
Same journal

Unsupervised Tree Boosting for Learning Probability Distributions.

Journal of machine learning research : JMLR·2026
Same journal

A Two-Stage Penalized Least Squares Method for Constructing Large Systems of Structural Equations.

Journal of machine learning research : JMLR·2026
Same journal

Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes.

Journal of machine learning research : JMLR·2026
See all related articles

Related Experiment Video

Updated: Mar 10, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.4K

Bayesian Sparse Gaussian Mixture Model for Clustering in High Dimensions.

Dapeng Yao1, Fangzheng Xie2, Yanxun Xu1

  • 1Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, Maryland, U.S.A.

Journal of Machine Learning Research : JMLR
|March 9, 2026
PubMed
Summary
This summary is machine-generated.

This study introduces a computationally tractable Bayesian approach for sparse high-dimensional Gaussian mixture models. The method achieves minimax optimal estimation rates and adaptively estimates the number of clusters, outperforming traditional methods.

Keywords:
ClusteringHigh dimensionsMinimax estimationPosterior contractionSingle-cell sequencing

More Related Videos

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K
Optical Scatter Microscopy Based on Two-Dimensional Gabor Filters
14:58

Optical Scatter Microscopy Based on Two-Dimensional Gabor Filters

Published on: June 2, 2010

10.0K

Related Experiment Videos

Last Updated: Mar 10, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.4K
A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments
08:12

A Psychophysics Paradigm for the Collection and Analysis of Similarity Judgments

Published on: March 1, 2022

3.0K
Optical Scatter Microscopy Based on Two-Dimensional Gabor Filters
14:58

Optical Scatter Microscopy Based on Two-Dimensional Gabor Filters

Published on: June 2, 2010

10.0K

Area of Science:

  • Statistics
  • Machine Learning
  • Computational Biology

Background:

  • High-dimensional Gaussian mixture models are widely used for clustering.
  • Traditional methods face computational intractability and require pre-specifying the number of clusters.
  • Growing number of clusters with sample size poses challenges in parameter estimation.

Purpose of the Study:

  • To develop a computationally tractable Bayesian approach for sparse high-dimensional Gaussian mixture models.
  • To establish minimax lower bounds for parameter estimation in this setting.
  • To demonstrate adaptive estimation of the number of clusters.

Main Methods:

  • Proposed a Bayesian approach using a continuous spike-and-slab prior for sparse cluster centers.
  • Established minimax lower bounds for parameter estimation.
  • Proved posterior contraction rates and derived mis-clustering rates using matrix perturbation theory.

Main Results:

  • The proposed Bayesian method achieves minimax optimal posterior contraction rates.
  • The method adaptively estimates the number of clusters without pre-specification.
  • Demonstrated validity and usefulness through simulations and single-cell RNA sequencing data analysis.

Conclusions:

  • The proposed Bayesian sparse Gaussian mixture model offers a computationally tractable and statistically robust solution.
  • The method effectively handles high-dimensional data and adaptively determines cluster numbers.
  • Applicable to real-world problems, including single-cell RNA sequencing analysis.