Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Margin of Error

Margin of Error

The margin of error is also called the maximum error of an estimate. The margin of error is the maximum possible or expected difference between the observed sample parameter value and the actual population parameter value. For proportion, it is the maximum difference between the value of sample proportion obtained from the data and the true value of population proportion. As the true value of the population parameter is not known, the margin of error is calculated using the sample statistic.

Hazard Rate

Hazard Rate

The hazard rate, also known as the hazard function or failure rate, is a statistical measure used to describe the instantaneous rate at which an event occurs, given that the event has not yet happened. From a probabilistic perspective, it represents the likelihood that a subject will experience the event in a very small time interval, conditional on surviving up to the beginning of that interval. In terms of frequency, the hazard rate can be viewed as the ratio of the number of events to the...

Unusual Results

Unusual Results

Unusual results are those that have a very low chance of occurring. Unusual results can be identified using probabilities and the range rule of thumb. In problems involving probability, unusual results can be observed in 2 instances – an unusually high number of successes or an unusually low number of successes.
According to the range rule of thumb, any value above or below two standard deviations, 2σ from the mean, μ is considered unusual.
Maximum unusual value =...

Empirical Method to Interpret Standard Deviation

Empirical Method to Interpret Standard Deviation

The empirical rule, also known as the three-sigma rule, allows a statistician to interpret the standard deviation in a normally distributed dataset. The rule states that 68% of the data lies within one standard deviation from the mean, 95% lies within two standard deviations from the mean, and 99.7% lies within three standard deviations from the mean. Additionally, this rule is also called the 68-95-99.7 rule.
This rule is used widely in statistics to calculate the proportion of data values...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Deciphering the mosaic genome of sugarcane cultivars through polyploid admixture inference with AdmixPoly.

Genome biology·2026

Same author

Integration of proxy intermediate omics traits into a nonlinear two-step model for accurate phenotypic prediction.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2026

Same author

Group Lasso Based Selection for High-Dimensional Mediation Analysis.

Statistics in medicine·2026

Same author

Genomic prediction-aided incorporation of genetic resources into elite breeding: lessons from a collaborative multiparental design in flint maize.

TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik·2025

Same author

Pseudo-observations and super learner for the estimation of the restricted mean survival time.

Lifetime data analysis·2025

Same author

Correction: Serum levels of per- and polyfluoroalkylated substances and methylation of DNA from peripheral blood.

Frontiers in public health·2025

Same journal

Targeted maximum likelihood estimation (TMLE) in regulatory submissions and research: a landscape analysis.

The international journal of biostatistics·2026

Same journal

Predicting birth weight by multivariate functional principal component regressions.

The international journal of biostatistics·2026

Same journal

Robust median regression for count data with general lower truncation using a contaminated discrete Weibull model.

The international journal of biostatistics·2026

Same journal

Handling the uncertainty issue of missingness via a mixture-structure-based method.

The international journal of biostatistics·2026

Same journal

Statistical method for pooling categorical biomarker data from multi-center matched/nested case-control studies.

The international journal of biostatistics·2026

Same journal

Prognostic score methods for the estimation of the average causal effect.

The international journal of biostatistics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 11, 2025

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

Error rate control for classification rules in multiclass mixture models.

Tristan Mary-Huard^1,2, Vittorio Perduca³, Marie-Laure Martin-Magniette^1,2

¹MIA-Paris, INRAE, AgroParisTech, Université Paris-Saclay, Paris, 75005, France.

The International Journal of Biostatistics

|November 30, 2021

Summary

This summary is machine-generated.

This study introduces a new multiclass False Discovery Rate (FDR)-like rule for finite mixture models, optimizing classification by minimizing Type II errors while controlling Type I errors. The novel rule is less conservative than traditional thresholded Maximum A Posteriori (MAP) rules.

Keywords:

classification rule error rate control mixture models

More Related Videos

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Related Experiment Videos

Last Updated: Oct 11, 2025

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Using the Race Model Inequality to Quantify Behavioral Multisensory Integration Effects

Published on: May 10, 2019

An R-Based Landscape Validation of a Competing Risk Model

An R-Based Landscape Validation of a Competing Risk Model

Published on: September 16, 2022

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

A Machine Learning Approach to Design an Efficient Selective Screening of Mild Cognitive Impairment

Published on: January 11, 2020

Area of Science:

Statistics
Machine Learning
Data Science

Background:

Finite mixture models are widely used for data clustering and classification.
Controlling classification error rates is crucial for reliable model interpretation.
Existing methods often use thresholded Maximum A Posteriori (MAP) rules, which can be conservative.

Purpose of the Study:

To develop an optimal classification rule for finite mixture models that minimizes Type II error rates while controlling Type I error rates.
To define and evaluate a multiclass False Discovery Rate (FDR)-like rule.
To compare the performance of the proposed rule against standard thresholded MAP rules.

Main Methods:

Defining Type I and Type II classification error rates analogous to statistical test theory.
Identifying an optimal region in the observation space for applying the MAP rule.
Developing a heuristic for computing the optimal classification rule.
Implementing and comparing a multiclass FDR-like rule with thresholded MAP rules.

Main Results:

An optimal classification rule is found to correspond to searching an optimal region in the observation space.
The shape of this optimal region depends on the misclassification rate to be controlled.
The proposed multiclass FDR-like optimal rule demonstrates a less conservative approach compared to thresholded MAP rules.
Validation on simulated and real datasets confirms the practical advantages of the FDR-like rule.

Conclusions:

The developed optimal classification rule offers improved performance in finite mixture models.
The multiclass FDR-like rule provides a more powerful alternative to conventional thresholded MAP rules.
This approach enhances the ability to classify observations accurately while maintaining control over error rates.