Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the vertical...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

How Data are Classified: Categorical Data

How Data are Classified: Categorical Data

A variable, usually notated by capital letters such as X and Y, is a characteristic or measurement that can be determined for each member of a population. Data are the actual values of variables. They may be numbers, or they may be words. Datum is a single value.
Data are classified based on whether they are measurable or not. Categorical data cannot be measured; instead, it can be divided into categories. For example, if Y denotes a person's party affiliation, some examples of Y include...

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Special collection devoted to the VIII "metabolomics circle" conference organized by the Polish metabolomics society.

Metabolomics : Official journal of the Metabolomic Society·2024

Same author

Common and distinct pollution sources identified from ambient PM<sub>2.5</sub> concentrations in two sites of Los Angeles Basin from 2005 to 2019.

Environmental pollution (Barking, Essex : 1987)·2023

Same author

Monitoring the concentrations of Cd, Cu, Pb, Ni, Cr, Zn, Mn and Fe in cultivated Haplic Luvisol soils using near-infrared reflectance spectroscopy and chemometrics.

Talanta·2022

Same author

Detecting chemical markers to uncover counterfeit rebated excise duty diesel oil.

Talanta·2019

Same author

A modified weighted mixture model for the interpretation of spatial and temporal changes in the microbial communities in drinking water reservoirs using compositional phospholipid fatty acid data.

Talanta·2016

Same author

Metabolomics of chronic obstructive pulmonary disease and obstructive sleep apnea syndrome: response to Maniscalco and Motta.

Metabolomics : Official journal of the Metabolomic Society·2016

Same journal

Pixel-wise assessment of industrial compost transformation by NIR hyperspectral imaging and chemometrics: an early-warning tool for process monitoring.

Talanta·2026

Same journal

Bifunctional covalent organic framework for rapid isolation of extracellular vesicles and proteomics-based biomarker discovery.

Talanta·2026

Same journal

Machine learning-assisted smart electrochemical platform: High-sensitivity simultaneous detection of rutin and luteolin in biological samples.

Talanta·2026

Same journal

Simultaneous separation of ezetimibe and its seven stereoisomers by supercritical fluid chromatography on a polysaccharide-based chiral stationary phase: Optimization and thermodynamic analysis.

Talanta·2026

Same journal

A novel borondifluoro indolenine-functionalized red-shift ratiometric fluorescent probe for detection of hypochlorous acid in drug-induced liver injury.

Talanta·2026

Same journal

A cost-performance index for nano-optical biosensor evaluation: Systematic evaluation of europium-salicylate luminescent platforms for GPC3-targeted early HCC diagnosis.

Talanta·2026

See all related articles

Search research articles

Related Experiment Videos

Classification of data with missing elements and outliers.

I Stanimirova¹, B Walczak

¹Department of Chemometrics, Institute of Chemistry, Silesian University, 9 Szkolna Street, 40-006 Katowice, Poland.

|July 1, 2008

Summary

This summary is machine-generated.

This study introduces a new method, EM-S-SIMCA, to effectively handle experimental data with missing values and outliers. The robust approach improves outlier detection and data analysis for incomplete datasets.

Related Experiment Videos

Area of Science:

Chemometrics
Data Analysis
Machine Learning

Background:

Experimental data frequently contains missing values and outliers, complicating analysis.
Outliers hinder least squares model parameter evaluation, while missing data affects outlier identification.
Robust methods for incomplete data with outliers are crucial for reliable results.

Purpose of the Study:

To present a novel robust method for analyzing incomplete experimental data with outliers.
To introduce the expectation-maximization robust soft independent modeling of class analogy (EM-S-SIMCA) approach.
To address key issues in model complexity, data set selection, and prediction for incomplete data.

Main Methods:

Development of the expectation-maximization robust soft independent modeling of class analogy (EM-S-SIMCA) method.
Incorporation of spherical SIMCA and leverage correction for model complexity.
Utilizing uniform design for training/test set selection with incomplete data.

Main Results:

EM-S-SIMCA demonstrated superior performance compared to the classic expectation-maximization SIMCA method.
The method effectively handles missing elements and outliers in data analysis.
Satisfactory results were achieved on both simulated and real-world datasets.

Conclusions:

The proposed EM-S-SIMCA method offers a robust solution for analyzing experimental data with missing values and outliers.
It provides improved accuracy and reliability in parameter estimation and outlier detection.
The approach is validated and effective for diverse datasets.