Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Variability: Analysis

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...

Regression Analysis

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:

Statistical Analysis: Overview

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...

Multiple Regression

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Friedman Two-way Analysis of Variance by Ranks

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures from...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Comparing variable selection and model averaging methods for logistic regression.

Proceedings of the National Academy of Sciences of the United States of America·2026

Same author

Bringing Age Back In: Accounting for Population Age Distribution in Forecasting Migration.

Demography·2026

Same author

Joint mixed-effects models for causal inference in clustered network-based observational studies.

Statistical methods in medical research·2025

Same author

Bayesian Projection of Extant Refugee and Asylum Seeker Populations.

Demography·2025

Same author

Author Correction: Optimal pandemic control strategies and cost-effectiveness of COVID-19 non-pharmaceutical interventions in the United States.

BMC global and public health·2025

Same author

Mitigation efforts to reduce carbon dioxide emissions and meet the Paris Agreement have been offset by economic growth.

Communications earth & environment·2025

Same journal

Gene-environment interaction analysis under the Cox model.

Annals of the Institute of Statistical Mathematics·2025

Same journal

Matrix completion under complex survey sampling.

Annals of the Institute of Statistical Mathematics·2023

Same journal

Generation of all randomizations using circuits.

Annals of the Institute of Statistical Mathematics·2023

Same journal

Nonparametric tests for multistate processes with clustered data.

Annals of the Institute of Statistical Mathematics·2022

Same journal

Semiparametric modelling of two-component mixtures with stochastic dominance.

Annals of the Institute of Statistical Mathematics·2022

Same journal

Weighted Estimating Equations for Additive Hazards Models with Missing Covariates.

Annals of the Institute of Statistical Mathematics·2019

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 9, 2026

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Latent Class Analysis Variable Selection.

Nema Dean, Adrian E Raftery

Annals of the Institute of Statistical Mathematics

|September 10, 2010

Summary

This summary is machine-generated.

This study introduces a novel method for variable selection in latent class analysis, improving clustering accuracy and efficiency. The approach effectively identifies key variables, reducing data complexity while maintaining robust group structure discovery.

More Related Videos

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Related Experiment Videos

Last Updated: Jun 9, 2026

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Area of Science:

Statistics
Data Mining
Bioinformatics

Background:

Latent class analysis (LCA) is a prevalent model-based clustering technique for discrete data.
Effective variable selection is crucial for accurate and efficient LCA model building.
Existing methods may not optimally identify variables that contribute most to cluster definition.

Purpose of the Study:

To develop and evaluate a novel method for selecting informative variables in latent class analysis.
To enhance the accuracy of cluster allocation and the determination of the number of classes.
To demonstrate the method's efficacy in reducing variable set size without compromising discovered group structures.

Main Methods:

A variable selection method comparing nested models to assess contribution to cluster allocation.
Utilized a headlong search algorithm to navigate the model space for variable selection.
Validated the method using simulated datasets and two real-world datasets, including genetic data from the International HapMap Project.

Main Results:

The proposed method successfully identified the correct clustering variables in simulated data.
Demonstrated improvements in classification performance and accuracy for selecting the number of classes.
In real datasets, the method identified the same group structures using significantly fewer variables, including a substantial reduction in single nucleotide polymorphisms (SNPs).

Conclusions:

The developed variable selection method is effective for latent class analysis.
It enhances model performance and simplifies data requirements by identifying essential variables.
The approach shows promise for applications in genetics and other fields requiring robust clustering of discrete data.