Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Cluster Sampling Method

Cluster Sampling Method

Appropriate sampling methods ensure that samples are drawn without bias and accurately represent the population. Because measuring the entire population in a study is not practical, researchers use samples to represent the population of interest.
To choose a cluster sample, divide the population into clusters (groups) and then randomly select some of the clusters. All the members from these clusters are in the cluster sample. For example, if you randomly sample four departments from your...

Choosing Between z and t Distribution

Choosing Between z and t Distribution

The z and the Student t distribution estimate the population mean using the sample mean and standard deviation. However, to decide which distribution to use for a calculation, one needs to determine the sample size, the nature of the distribution, and whether the population standard deviation is known. If the population standard deviation is known and the population is normally distributed, or if the sample size is greater than 30, the z distribution is preferred. The Student t distribution is...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for ka Estimation

One-Compartment Open Model: Wagner-Nelson and Loo Riegelman Method for k_a Estimation

This lesson introduces two critical methods in pharmacokinetics, the Wagner-Nelson and Loo-Riegelman methods, used for estimating the absorption rate constant (ka) for drugs administered via non-intravenous routes. The Wagner-Nelson method relates ka to the plasma concentration derived from the slope of a semilog percent unabsorbed time plot. However, it is limited to drugs with one-compartment kinetics and can be impacted by factors like gastrointestinal motility or enzymatic degradation.
On...

Sampling Plans

Sampling Plans

Sampling is a crucial step in analytical chemistry, allowing researchers to collect representative data from a large population. Common sampling methods include random, judgmental, systematic, stratified, and cluster sampling.
Random sampling is a method where each member of the population has an equal chance of being selected for the sample. It involves selecting individuals randomly, often using random number generators or lottery-type methods. For example, when analyzing the properties of a...

Distributions to Estimate Population Parameter

Distributions to Estimate Population Parameter

The accurate values of population parameters such as population proportion, population mean, and population standard deviation (or variance) are usually unknown. These are fixed values that can only be estimated from the data collected from the samples. The estimates of each of these parameters are sample proportion, the sample mean, and sample standard deviation (or variance). To obtain the values of these sample statistics, data are required that have particular distribution and central...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

A cell wall-associated gene network shapes leaf boundary domains.

Development (Cambridge, England)·2022

Same author

Untargeted metabolomic analyses reveal the diversity and plasticity of the specialized metabolome in seeds of different Camelina sativa genotypes.

The Plant journal : for cell and molecular biology·2022

Same author

Error rate control for classification rules in multiclass mixture models.

The international journal of biostatistics·2021

Same author

Systemic control of nodule formation by plant nitrogen demand requires autoregulation-dependent and independent mechanisms.

Journal of experimental botany·2021

Same author

A Case of Gene Fragmentation in Plant Mitochondria Fixed by the Selection of a Compensatory Restorer of Fertility-Like PPR Gene.

Molecular biology and evolution·2021

Same author

Involvement of SUT1 and SUT2 Sugar Transporters in the Impairment of Sugar Transport and Changes in Phloem Exudate Contents in Phytoplasma-Infected Plants.

International journal of molecular sciences·2021

Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026

Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026

Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026

Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026

Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026

Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 25, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Variable selection for clustering with Gaussian mixture models.

Cathy Maugis¹, Gilles Celeux, Marie-Laure Martin-Magniette

¹Department of Mathematics, University Paris-Sud 11, Orsay, France. Cathy.Maugis@math.u-psud.fr

|February 13, 2009

Summary

This summary is machine-generated.

This study introduces a new method for variable selection in model-based cluster analysis. The approach uses a generalized model and Bayesian information criterion for robust variable identification in clustering and regression tasks.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jun 25, 2026

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Statistics
Machine Learning
Bioinformatics

Background:

Variable selection is crucial for effective cluster analysis, particularly in model-based approaches.
Existing methods often require assumptions about variable relationships, limiting their applicability.
Identifying the role of each variable enhances the interpretability and performance of clustering models.

Purpose of the Study:

To propose a generalized model for variable selection in model-based cluster analysis.
To develop a method that does not assume linear relationships between selected and discarded variables.
To provide a statistically sound procedure for determining variable roles in clustering.

Main Methods:

A generalized model is proposed, extending previous work by Raftery and Dean.
Bayesian Information Criterion (BIC) is employed for model comparison and selection.
A novel algorithm combines backward stepwise selection for clustering and linear regression to ascertain variable roles.
Model identifiability and criterion consistency are theoretically established.

Main Results:

The proposed method effectively identifies relevant variables for cluster analysis without prior assumptions on variable relationships.
Numerical experiments on simulated datasets demonstrate the procedure's efficacy.
Application to genomic data highlights the practical utility and performance of the variable selection technique.

Conclusions:

The developed variable selection procedure offers a flexible and robust approach for model-based cluster analysis.
The method enhances the interpretability of clustering results by specifying variable roles.
This technique shows promise for applications in diverse fields, including bioinformatics and data mining.