Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Outliers and Influential Points01:08

Outliers and Influential Points

4.9K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.9K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.8K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.8K
What Are Outliers?01:12

What Are Outliers?

4.6K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
4.6K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.5K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.5K
Expected Frequencies in Goodness-of-Fit Tests01:19

Expected Frequencies in Goodness-of-Fit Tests

4.2K
A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).
4.2K
Frequency-dependent Selection01:21

Frequency-dependent Selection

22.4K
When the fitness of a trait is influenced by how common it is (i.e., its frequency) relative to different traits within a population, this is referred to as frequency-dependent selection. Frequency-dependent selection may occur between species or within a single species. This type of selection can either be positive—with more common phenotypes having higher fitness—or negative, with rarer phenotypes conferring increased fitness.
22.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Substitution Spectrum and Selection at G-quadruplexes in Great Ape Telomere-to-Telomere Genomes.

Genome biology and evolution·2026
Same author

Mammalian mitochondrial DNA accumulates insertions and deletions with age in energetically demanding tissues.

Molecular biology and evolution·2026
Same author

Allele Frequency Selection and No Age-Related Increase in Human Oocyte Mitochondrial Mutations.

Obstetrical & gynecological survey·2026
Same author

Contrasting pre-vaccine COVID-19 waves in Italy through functional data analysis.

Scientific reports·2025
Same author

Comparative analysis of single-stranded and non-canonical DNA formation in human and other ape cells with telomere-to-telomere genomes.

bioRxiv : the preprint server for biology·2025
Same author

Non-canonical DNA in bird telomere-to-telomere genomes.

bioRxiv : the preprint server for biology·2025
Same journal

Fast penalized generalized estimating equations for large longitudinal functional datasets.

Biometrics·2026
Same journal

Causally-interpretable random-effects meta-analysis.

Biometrics·2026
Same journal

Statistical inference for mean function of partially observed functional time series.

Biometrics·2026
Same journal

Subgroup identification via Interaction Tree and Mixed Model for Repeated Measures with application to Alzheimer's disease.

Biometrics·2026
Same journal

Finite mixtures of linear quantile regressions with concomitant variables: a solution to endogeneity in longitudinal data modeling.

Biometrics·2026
Same journal

Discussion on "INTACT: a method for integration of longitudinal physical activity data from multiple sources" by Jingru Zhang, Erjia Cui, Hongzhe Li, and Haochang Shou.

Biometrics·2026
See all related articles

Related Experiment Video

Updated: Oct 22, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K

Simultaneous feature selection and outlier detection with optimality guarantees.

Luca Insolia1,2, Ana Kenney3, Francesca Chiaromonte2,3

  • 1Faculty of Sciences, Scuola Normale Superiore, Pisa, Italy.

Biometrics
|August 26, 2021
PubMed
Summary
This summary is machine-generated.

This study introduces a new method for analyzing complex biomedical data with potential errors. The approach effectively handles outliers and selects relevant features, improving data analysis accuracy.

Keywords:
breakdown pointmixed-integer programmingregression analysisrobust regressionsparse estimationstrong oracle property

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.0K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.0K

Related Experiment Videos

Last Updated: Oct 22, 2025

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances
07:35

Selecting Multiple Biomarker Subsets with Similarly Effective Binary Classification Performances

Published on: October 11, 2018

7.7K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.0K
Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers
03:37

Author Spotlight: Impact of Intergenic Interactions on Disease-Identifying Dark Biomarkers

Published on: March 1, 2024

1.0K

Area of Science:

  • Biostatistics
  • Computational Biology
  • Genomics

Background:

  • Biomedical research generates vast datasets with numerous features, increasing the risk of redundant or contaminated data.
  • Small sample sizes exacerbate challenges posed by data redundancy and outliers in high-dimensional studies.
  • Robust sparse estimation methods are crucial for accurate analysis in data-rich biomedical research.

Purpose of the Study:

  • To develop a general framework for robust sparse estimation in high-dimensional regressions.
  • To simultaneously perform feature selection and outlier detection in the presence of mean-shift outliers in both response and design matrices.
  • To provide a method with provably optimal guarantees for feature selection and outlier detection.

Main Methods:

  • Utilized mixed-integer programming for simultaneous feature selection and outlier detection.
  • Developed a framework to address multiple mean-shift outliers in high-dimensional regression models.
  • Proved theoretical properties including the robustly strong oracle property, optimal parameter estimation, and breakdown point.

Main Results:

  • Demonstrated a necessary and sufficient condition for the robustly strong oracle property, allowing feature count to grow exponentially with sample size.
  • Achieved optimal estimation of parameters and established the breakdown point of the developed estimates.
  • Showcased superior performance against existing heuristic methods via simulations.

Conclusions:

  • The proposed mixed-integer programming framework offers a robust and optimal solution for feature selection and outlier detection in high-dimensional biomedical data.
  • The method provides theoretical guarantees and demonstrates superior performance, making it valuable for analyzing complex datasets.
  • Applied the method to investigate links between childhood obesity and the human microbiome, highlighting its practical utility.