Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Variability: Analysis01:11

Variability: Analysis

Measures of variability are statistical metrics that reveal the dispersion pattern within a dataset. They are pivotal in biostatistics, providing insights into the heterogeneity within health and biological data. Variability signifies the degree to which data points diverge from one another, helping researchers understand the potential range of values and associated uncertainty within the data.
The range is a simple measure of variability, indicating the difference between the highest and...
Regression Analysis01:11

Regression Analysis

Regression analysis is a statistical tool that describes a mathematical relationship between a dependent variable and one or more independent variables.
In regression analysis, a regression equation is determined based on the line of best fit– a line that best fits the data points plotted in a graph. This line is also called the regression line. The algebraic equation for the regression line is called the regression equation. It is represented as:
Statistical Analysis: Overview01:11

Statistical Analysis: Overview

When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
Multiple Regression01:25

Multiple Regression

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...
Friedman Two-way Analysis of Variance by Ranks01:21

Friedman Two-way Analysis of Variance by Ranks

Friedman's Two-Way Analysis of Variance by Ranks is a nonparametric test designed to identify differences across multiple test attempts when traditional assumptions of normality and equal variances do not apply. Unlike conventional ANOVA, which requires normally distributed data with equal variances, Friedman's test is ideal for ordinal or non-normally distributed data, making it particularly useful for analyzing dependent samples, such as matched subjects over time or repeated measures from...
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Comparing variable selection and model averaging methods for logistic regression.

Proceedings of the National Academy of Sciences of the United States of America·2026
Same author

Bringing Age Back In: Accounting for Population Age Distribution in Forecasting Migration.

Demography·2026
Same author

Joint mixed-effects models for causal inference in clustered network-based observational studies.

Statistical methods in medical research·2025
Same author

Bayesian Projection of Extant Refugee and Asylum Seeker Populations.

Demography·2025
Same author

Author Correction: Optimal pandemic control strategies and cost-effectiveness of COVID-19 non-pharmaceutical interventions in the United States.

BMC global and public health·2025
Same author

Mitigation efforts to reduce carbon dioxide emissions and meet the Paris Agreement have been offset by economic growth.

Communications earth & environment·2025
Same journal

Gene-environment interaction analysis under the Cox model.

Annals of the Institute of Statistical Mathematics·2025
Same journal

Matrix completion under complex survey sampling.

Annals of the Institute of Statistical Mathematics·2023
Same journal

Generation of all randomizations using circuits.

Annals of the Institute of Statistical Mathematics·2023
Same journal

Nonparametric tests for multistate processes with clustered data.

Annals of the Institute of Statistical Mathematics·2022
Same journal

Semiparametric modelling of two-component mixtures with stochastic dominance.

Annals of the Institute of Statistical Mathematics·2022
Same journal

Weighted Estimating Equations for Additive Hazards Models with Missing Covariates.

Annals of the Institute of Statistical Mathematics·2019
See all related articles

Related Experiment Video

Updated: Jun 9, 2026

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
06:48

Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

Published on: June 25, 2019

Latent Class Analysis Variable Selection.

Nema Dean, Adrian E Raftery

    Annals of the Institute of Statistical Mathematics
    |September 10, 2010
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces a novel method for variable selection in latent class analysis, improving clustering accuracy and efficiency. The approach effectively identifies key variables, reducing data complexity while maintaining robust group structure discovery.

    More Related Videos

    Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
    07:35

    Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

    Published on: October 11, 2018

    Related Experiment Videos

    Last Updated: Jun 9, 2026

    Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment
    06:48

    Lexical Decision Task for Studying Written Word Recognition in Adults with and without Dementia or Mild Cognitive Impairment

    Published on: June 25, 2019

    Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
    07:35

    Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

    Published on: October 11, 2018

    Area of Science:

    • Statistics
    • Data Mining
    • Bioinformatics

    Background:

    • Latent class analysis (LCA) is a prevalent model-based clustering technique for discrete data.
    • Effective variable selection is crucial for accurate and efficient LCA model building.
    • Existing methods may not optimally identify variables that contribute most to cluster definition.

    Purpose of the Study:

    • To develop and evaluate a novel method for selecting informative variables in latent class analysis.
    • To enhance the accuracy of cluster allocation and the determination of the number of classes.
    • To demonstrate the method's efficacy in reducing variable set size without compromising discovered group structures.

    Main Methods:

    • A variable selection method comparing nested models to assess contribution to cluster allocation.
    • Utilized a headlong search algorithm to navigate the model space for variable selection.
    • Validated the method using simulated datasets and two real-world datasets, including genetic data from the International HapMap Project.

    Main Results:

    • The proposed method successfully identified the correct clustering variables in simulated data.
    • Demonstrated improvements in classification performance and accuracy for selecting the number of classes.
    • In real datasets, the method identified the same group structures using significantly fewer variables, including a substantial reduction in single nucleotide polymorphisms (SNPs).

    Conclusions:

    • The developed variable selection method is effective for latent class analysis.
    • It enhances model performance and simplifies data requirements by identifying essential variables.
    • The approach shows promise for applications in genetics and other fields requiring robust clustering of discrete data.