Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Expected Frequencies in Goodness-of-Fit Tests

Expected Frequencies in Goodness-of-Fit Tests

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n) to the number of categories (k).

Frequency-dependent Selection

Frequency-dependent Selection

When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Substitution Spectrum and Selection at G-quadruplexes in Great Ape Telomere-to-Telomere Genomes.

Genome biology and evolution·2026

Same author

Mammalian mitochondrial DNA accumulates insertions and deletions with age in energetically demanding tissues.

Molecular biology and evolution·2026

Same author

Allele Frequency Selection and No Age-Related Increase in Human Oocyte Mitochondrial Mutations.

Obstetrical & gynecological survey·2026

Same author

Contrasting pre-vaccine COVID-19 waves in Italy through functional data analysis.

Scientific reports·2025

Same author

Comparative analysis of single-stranded and non-canonical DNA formation in human and other ape cells with telomere-to-telomere genomes.

bioRxiv : the preprint server for biology·2025

Same author

Non-canonical DNA in bird telomere-to-telomere genomes.

bioRxiv : the preprint server for biology·2025

Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026

Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026

Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026

Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026

Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026

Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Oct 22, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Simultaneous feature selection and outlier detection with optimality guarantees.

Luca Insolia^1,2, Ana Kenney³, Francesca Chiaromonte^2,3

¹Faculty of Sciences, Scuola Normale Superiore, Pisa, Italy.

|August 26, 2021

Summary

This summary is machine-generated.

This study introduces a new method for analyzing complex biomedical data with potential errors. The approach effectively handles outliers and selects relevant features, improving data analysis accuracy.

Keywords:

breakdown point mixed-integer programming regression analysis robust regression sparse estimation strong oracle property

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Related Experiment Videos

Last Updated: Oct 22, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

Area of Science:

Biostatistics
Computational Biology
Genomics

Background:

Biomedical research generates vast datasets with numerous features, increasing the risk of redundant or contaminated data.
Small sample sizes exacerbate challenges posed by data redundancy and outliers in high-dimensional studies.
Robust sparse estimation methods are crucial for accurate analysis in data-rich biomedical research.

Purpose of the Study:

To develop a general framework for robust sparse estimation in high-dimensional regressions.
To simultaneously perform feature selection and outlier detection in the presence of mean-shift outliers in both response and design matrices.
To provide a method with provably optimal guarantees for feature selection and outlier detection.

Main Methods:

Utilized mixed-integer programming for simultaneous feature selection and outlier detection.
Developed a framework to address multiple mean-shift outliers in high-dimensional regression models.
Proved theoretical properties including the robustly strong oracle property, optimal parameter estimation, and breakdown point.

Main Results:

Demonstrated a necessary and sufficient condition for the robustly strong oracle property, allowing feature count to grow exponentially with sample size.
Achieved optimal estimation of parameters and established the breakdown point of the developed estimates.
Showcased superior performance against existing heuristic methods via simulations.

Conclusions:

The proposed mixed-integer programming framework offers a robust and optimal solution for feature selection and outlier detection in high-dimensional biomedical data.
The method provides theoretical guarantees and demonstrates superior performance, making it valuable for analyzing complex datasets.
Applied the method to investigate links between childhood obesity and the human microbiome, highlighting its practical utility.