Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Outliers and Influential Points01:08

Outliers and Influential Points

6.5K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.5K
What Are Outliers?01:12

What Are Outliers?

5.3K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.3K
Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.2K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.2K
Maximum Size of Aggregate01:12

Maximum Size of Aggregate

594
The maximum size of aggregate is defined as the aperture of the sieve retaining 15 percent or more of the particles present in the aggregate sample. The aggregate's maximum size impacts the concrete's water requirement, workability, and strength. Larger aggregates reduce the surface area needing cement paste coverage, which can lower water needs, thereby allowing a decrease in the water-to-cement ratio when the desired workability and richness of the mix are to be maintained, which can...
594
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

7.1K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
7.1K
Modified Boxplots00:57

Modified Boxplots

11.4K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
11.4K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Transforming Scagnostics to Reveal Hidden Features.

IEEE transactions on visualization and computer graphics·2015
Same author

TimeSeer: Scagnostics for high-dimensional time series.

IEEE transactions on visualization and computer graphics·2013
Same author

Exact and approximate area-proportional circular Venn and Euler diagrams.

IEEE transactions on visualization and computer graphics·2011
Same author

Stacking graphic elements to avoid over-plotting.

IEEE transactions on visualization and computer graphics·2010
Same author

High-dimensional visual analytics: interactive exploration guided by pairwise views of point distributions.

IEEE transactions on visualization and computer graphics·2006
Same journal

Blue Noise Dithering for Reservoir-based Spatio-temporal Importance Resampling.

IEEE transactions on visualization and computer graphics·2026
Same journal

ROS-GS: Relightable Outdoor Scenes With Gaussian Splatting.

IEEE transactions on visualization and computer graphics·2026
Same journal

MesoSplats: Texture Synthesis with Gaussian Splatting.

IEEE transactions on visualization and computer graphics·2026
Same journal

GLLA: A Unified Force-Directed Graph Layout Framework Supporting Local Adjustments.

IEEE transactions on visualization and computer graphics·2026
Same journal

Multi-Perception Crowd: Learning to combine entity and implicit perception for diverse crowd simulation.

IEEE transactions on visualization and computer graphics·2026
Same journal

Hiding in Plain Sight: Camouflaging Real-world Objects.

IEEE transactions on visualization and computer graphics·2026
See all related articles

Related Experiment Video

Updated: Feb 23, 2026

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
09:43

Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

Published on: November 22, 2019

6.8K

Visualizing Big Data Outliers through Distributed Aggregation.

Leland Wilkinson

    IEEE Transactions on Visualization and Computer Graphics
    |September 4, 2017
    PubMed
    Summary
    This summary is machine-generated.

    This study introduces hdoutliers, a novel algorithm for detecting multidimensional outliers in large datasets. It uniquely handles mixed variable types and scales, providing probabilistic outlier identification to minimize false discoveries.

    More Related Videos

    A User-friendly and Powerful R Analysis of Large-scale Datasets
    10:56

    A User-friendly and Powerful R Analysis of Large-scale Datasets

    Published on: November 4, 2025

    415
    Measuring Transcellular Interactions through Protein Aggregation in a Heterologous Cell System
    04:47

    Measuring Transcellular Interactions through Protein Aggregation in a Heterologous Cell System

    Published on: May 22, 2020

    4.0K

    Related Experiment Videos

    Last Updated: Feb 23, 2026

    Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering
    09:43

    Databases to Efficiently Manage Medium Sized, Low Velocity, Multidimensional Data in Tissue Engineering

    Published on: November 22, 2019

    6.8K
    A User-friendly and Powerful R Analysis of Large-scale Datasets
    10:56

    A User-friendly and Powerful R Analysis of Large-scale Datasets

    Published on: November 4, 2025

    415
    Measuring Transcellular Interactions through Protein Aggregation in a Heterologous Cell System
    04:47

    Measuring Transcellular Interactions through Protein Aggregation in a Heterologous Cell System

    Published on: May 22, 2020

    4.0K

    Area of Science:

    • Data Science
    • Statistics
    • Machine Learning

    Background:

    • Visualizing outliers in massive datasets necessitates statistical pre-processing for manageable rendering and analysis.
    • Existing methods often struggle with high-dimensional data, mixed variable types, and scale.

    Purpose of the Study:

    • To present a new algorithm, hdoutliers, for detecting multidimensional outliers.
    • To address limitations of current outlier detection methods in big data scenarios.

    Main Methods:

    • Developed a novel algorithm, hdoutliers, for multidimensional outlier detection.
    • The algorithm handles mixtures of categorical and continuous variables.
    • It addresses big-p (many columns) and big-n (many rows) data challenges, including masking outliers.

    Main Results:

    • hdoutliers successfully detects multidimensional outliers in large, complex datasets.
    • The algorithm offers consistent handling of both unidimensional and multidimensional datasets.
    • Outliers are tagged with a probability, reducing false discoveries.

    Conclusions:

    • hdoutliers provides a robust, statistically grounded method for outlier detection in massive datasets.
    • Its unique features enable effective analysis of complex, high-dimensional data with mixed variable types.
    • The probabilistic tagging of outliers enhances analytical reliability.