Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
Outliers and Influential Points01:08

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the vertical...
What Are Outliers?01:12

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
DNA Microarrays02:34

DNA Microarrays

Microarrays are high-throughput and relatively inexpensive assays that can be automated to analyze large quantities of data at a time. They are used in genome-wide studies to compare gene or protein expression under two varied conditions, such as healthy and diseased states. Microarrays consist of glass or silica slides on which probe molecules are covalently attached through surface functionalization. Most commonly, the slides are prepared through the chemisorption of silanes to silica...
Modified Boxplots00:57

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Using Wavelet Entropy to Demonstrate how Mindfulness Practice Increases Coordination between Irregular Cerebral and Cardiac Activities.

Journal of visualized experiments : JoVE·2017
Same author

Attention during functional tasks is associated with motor performance in children with developmental coordination disorder: A cross-sectional study.

Medicine·2016
Same author

Normalization of Pain-Evoked Neural Responses Using Spontaneous EEG Improves the Performance of EEG-Based Cross-Individual Pain Prediction.

Frontiers in computational neuroscience·2016
Same author

Decoding Subjective Intensity of Nociceptive Pain from Pre-stimulus and Post-stimulus Brain Activities.

Frontiers in computational neuroscience·2016
Same author

N1 Magnitude of Auditory Evoked Potentials and Spontaneous Functional Connectivity Between Bilateral Heschl's Gyrus Are Coupled at Interindividual Level.

Brain connectivity·2016
Same author

Joint source separation of simultaneous EEG-fMRI recording in two experimental conditions using common spatial patterns.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference·2016
Same journal

Annealed variational mixtures for disease subtyping and biomarker discovery.

Statistical applications in genetics and molecular biology·2026
Same journal

Performance of the permutation test approach with base calling errors for detecting changes in variant allele frequencies in ctDNA for a single patient.

Statistical applications in genetics and molecular biology·2026
Same journal

BLOG: Bayesian longitudinal omics with group constraints.

Statistical applications in genetics and molecular biology·2026
Same journal

AI-driven risk prediction and categorization in cystic fibrosis leveraging AttentiveLSTM and Fox Wolf Optimizer.

Statistical applications in genetics and molecular biology·2026
Same journal

Perfect collinearity not created equal: measuring and visualizing the severity of multi-collinearity of modern omics data.

Statistical applications in genetics and molecular biology·2026
Same journal

Corrigendum to: Choice of baseline hazards in joint modeling of longitudinal and time-to-event cancer survival data.

Statistical applications in genetics and molecular biology·2025
See all related articles

Related Experiment Video

Updated: Jun 25, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries
11:59

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Detecting outlier samples in microarray data.

Albert D Shieh1, Yeung Sam Hung

  • 1Harvard University. shieh@fas.harvard.edu

Statistical Applications in Genetics and Molecular Biology
|February 19, 2009
PubMed
Summary
This summary is machine-generated.

This study introduces an automatic method for detecting outlier samples in microarray data using principal component analysis (PCA) and robust Mahalanobis distances. The method accurately identifies significant outliers, improving downstream data analysis and classifier performance.

More Related Videos

Introductory Analysis and Validation of CUT&RUN Sequencing Data
04:58

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

DNA Microarrays: Sample Quality Control, Array Hybridization and Scanning
09:27

DNA Microarrays: Sample Quality Control, Array Hybridization and Scanning

Published on: March 15, 2011

Related Experiment Videos

Last Updated: Jun 25, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries
11:59

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Introductory Analysis and Validation of CUT&RUN Sequencing Data
04:58

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

DNA Microarrays: Sample Quality Control, Array Hybridization and Scanning
09:27

DNA Microarrays: Sample Quality Control, Array Hybridization and Scanning

Published on: March 15, 2011

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Outlier samples with distinct expression patterns are present in microarray data.
  • These outliers can negatively impact the accuracy of data analysis and biological interpretation.
  • Identifying and addressing outliers is crucial for robust microarray analysis.

Purpose of the Study:

  • To develop a fully automatic method for detecting outlier samples in microarray data.
  • To assess the accuracy of the proposed outlier detection method.
  • To evaluate the impact of outlier removal on the performance of predictive classifiers.

Main Methods:

  • Principal Component Analysis (PCA) for dimensionality reduction.
  • Robust estimation of Mahalanobis distances for outlier scoring.
  • Comparison with a prominent robust PCA method for validation.

Main Results:

  • The proposed method accurately identifies biologically significant outliers.
  • Removal of detected outliers enhances the prediction accuracy of classifiers.
  • The method demonstrates robustness and effectiveness in outlier detection.

Conclusions:

  • The developed automatic outlier detection method is effective for microarray data.
  • Outlier identification and removal are beneficial for improving the reliability of microarray analyses.
  • This approach offers a valuable tool for researchers working with high-dimensional gene expression data.