Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Steps in Outbreak Investigation

Steps in Outbreak Investigation

In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:

Prediction Intervals

Prediction Intervals

The interval estimate of any variable is known as the prediction interval. It helps decide if a point estimate is dependable.
However, the point estimate is most likely not the exact value of the population parameter, but close to it. After calculating point estimates, we construct interval estimates, called confidence intervals or prediction intervals. This prediction interval comprises a range of values unlike the point estimate and is a better predictor of the observed sample value, y.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

MIXPRS enables multi-population and multi-method polygenic risk scores using summary statistics.

Nature genetics·2026

Same author

Analysis Of Salivary Herpesviruses Reveals Associations Between HHV-6 And Long COVID Severity.

medRxiv : the preprint server for health sciences·2026

Same author

Empiric azithromycin alters the upper respiratory microbiome and resistome without anti-inflammatory benefit in COVID-19.

Nature microbiology·2026

Same author

Author Correction: Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors.

Communications medicine·2026

Same author

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors.

Communications medicine·2026

Same author

Annotation-free discovery of disease-relevant cells in single-cell datasets.

Science advances·2025

Same journal

Simplifying debiased inference via automatic differentiation and probabilistic programming.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Principal stratification with U-statistics under principal ignorability.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Causal K-Means Clustering.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Correction to: Inference of dependency knowledge graph for Electronic Health Records.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

Same journal

Harmonized Estimation of Subgroup-Specific Treatment Effects in Randomized Trials: The Use of External Control Data.

Journal of the Royal Statistical Society. Series B, Statistical methodology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Sep 2, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Prediction and outlier detection in classification problems.

Leying Guan¹, Robert Tibshirani²

¹Yale University New Haven CT USA.

Journal of the Royal Statistical Society. Series B, Statistical Methodology

|August 1, 2022

Summary

This summary is machine-generated.

This study introduces Balanced and Conformal Optimized Prediction Sets (BCOPS) for multi-class classification with differing data distributions. BCOPS optimizes predictions to include correct classes and identify outliers, ensuring reliable performance without distributional assumptions.

Keywords:

BCOPS conformal inference distributional change label shift set‐valued prediction

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Related Experiment Videos

Last Updated: Sep 2, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Design and Analysis for Fall Detection System Simplification

Design and Analysis for Fall Detection System Simplification

Published on: April 6, 2020

Area of Science:

Machine Learning
Statistical Learning Theory
Data Science

Background:

Standard multi-class classification assumes training and test data follow identical distributions, which is often violated in real-world scenarios.
Outlier detection and robust prediction set construction are crucial for reliable decision-making when data distributions shift.

Purpose of the Study:

To propose a novel method, Balanced and Conformal Optimized Prediction Sets (BCOPS), for multi-class classification under distribution shifts.
To optimize prediction sets for out-of-sample performance, balancing correct class inclusion and outlier detection.
To provide finite sample coverage guarantees without requiring distributional assumptions.

Main Methods:

BCOPS combines supervised learning algorithms with conformal prediction principles.
It constructs prediction sets C(x) that are subsets of class labels, potentially empty to indicate outliers.
The method minimizes a misclassification loss averaged over the out-of-sample distribution.

Main Results:

BCOPS provides a finite sample coverage guarantee for prediction sets, irrespective of distributional assumptions.
The method demonstrates the ability to detect outliers by returning an empty prediction set.
Asymptotic consistency and optimality of the proposed methods are proven under stated assumptions.

Conclusions:

BCOPS offers a robust framework for multi-class classification when training and test data distributions differ.
The method enhances prediction reliability by incorporating outlier detection and providing coverage guarantees.
The proposed outlier detection rate estimation method aids in evaluating classification procedure performance.