Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

2.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
2.6K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

6.5K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
6.5K
Outliers and Influential Points01:08

Outliers and Influential Points

4.7K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
4.7K
What Are Outliers?01:12

What Are Outliers?

4.5K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
4.5K
Modified Boxplots00:57

Modified Boxplots

10.5K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
10.5K
Wilcoxon Rank-Sum Test01:21

Wilcoxon Rank-Sum Test

377
The Wilcoxon rank-sum test, also known as the Mann-Whitney U test, is a nonparametric test used to determine if there is a significant difference between the distributions of two independent samples. This test is designed specifically for two independent populations and has the following key requirements:
377

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Improving Classification Performance in Dendritic Neuron Models through Practical Initialization Strategies.

Sensors (Basel, Switzerland)·2024
Same author

COVID-19 Semantic Pneumonia Segmentation and Classification Using Artificial Intelligence.

Contrast media & molecular imaging·2022
Same author

Intelligent Control Techniques for the Detection of Biomedical Ear Infections.

Computational intelligence and neuroscience·2022
Same author

Separation of Different Blogs from Skin Disease Data using Artificial Intelligence.

Computational intelligence and neuroscience·2022
Same author

Single Diode Solar Cells-Improved Model and Exact Current-Voltage Analytical Solution Based on Lambert's W Function.

Sensors (Basel, Switzerland)·2022
Same author

Arousability, Personality, and Decision-Making Ability in Dissociative Disorder.

Indian journal of psychological medicine·2022
Same journal

Research on a Regional Availability Evaluation Model for Road-Area High-Entropy Energy Based on Synergy Factors.

Entropy (Basel, Switzerland)·2026
Same journal

Atmospheric Turbulence Channel Modeling and Performance Analysis of a CO-ZP-OFDM Coherent Optical Communication System for UAV Air-to-Ground Scenarios.

Entropy (Basel, Switzerland)·2026
Same journal

Information Geometry and Asymptotic Theory for SMML Estimators.

Entropy (Basel, Switzerland)·2026
Same journal

Correlation Entropy and Power-Law Kinetics.

Entropy (Basel, Switzerland)·2026
Same journal

Research on the Contagion of Systemic Financial Risk Under the Impact of Climate Risks-From the Perspective of Complex Networks and Machine Learning.

Entropy (Basel, Switzerland)·2026
Same journal

The Statistical-Mechanical Meaning of the Wave Function of Quantum Mechanics.

Entropy (Basel, Switzerland)·2026
See all related articles

Related Experiment Video

Updated: Oct 9, 2025

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences
08:33

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

7.2K

Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews.

Ishani Chatterjee1, Mengchu Zhou1,2, Abdullah Abusorrah2

  • 1Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA.

Entropy (Basel, Switzerland)
|December 24, 2021
PubMed
Summary
This summary is machine-generated.

This study introduces a statistics-based outlier detection and correction method (SODCM) to improve sentiment analysis accuracy by identifying and fixing mismatched star ratings in customer reviews. SODCM enhances sentiment analysis performance without data loss.

Keywords:
J-shaped distributionTextBlobbig data analyticsdata scrappingimbalance datasetinterquartile rangenatural language processingoutlier detectionsentiment analysis

More Related Videos

Competitive Genomic Screens of Barcoded Yeast Libraries
11:59

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

18.5K
Modified Most Probable Number Assay to Quantify Salmonella in Raw and Ready-to-Cook Chicken Products
08:19

Modified Most Probable Number Assay to Quantify Salmonella in Raw and Ready-to-Cook Chicken Products

Published on: January 31, 2025

578

Related Experiment Videos

Last Updated: Oct 9, 2025

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences
08:33

A Cross-Disciplinary and Multi-Modal Experimental Design for Studying Near-Real-Time Authentic Examination Experiences

Published on: September 4, 2019

7.2K
Competitive Genomic Screens of Barcoded Yeast Libraries
11:59

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

18.5K
Modified Most Probable Number Assay to Quantify Salmonella in Raw and Ready-to-Cook Chicken Products
08:19

Modified Most Probable Number Assay to Quantify Salmonella in Raw and Ready-to-Cook Chicken Products

Published on: January 31, 2025

578

Area of Science:

  • Data Science
  • Natural Language Processing
  • Machine Learning

Background:

  • Social networking sites are rich sources for data analytics, sentiment analysis, and natural language processing.
  • Customer reviews conventionally align sentiment with star ratings, but outliers with mismatched sentiment exist.
  • Current anomaly detection methods for reviews include manual searching, predefined rules, and traditional machine learning.

Purpose of the Study:

  • To conduct a sentiment analysis and outlier detection case study on Amazon customer reviews.
  • To propose a statistics-based outlier detection and correction method (SODCM) for enhancing sentiment analysis.
  • To evaluate the impact of SODCM on sentiment analysis algorithm performance.

Main Methods:

  • Developed and applied a statistics-based outlier detection and correction method (SODCM).
  • Collected and analyzed datasets of customer reviews scraped from Amazon.com and publicly available sources.
  • Performed sentiment analysis and outlier detection on the curated datasets.

Main Results:

  • The proposed SODCM effectively identifies customer reviews where the star rating contradicts the expressed sentiment.
  • SODCM rectifies star ratings, improving the overall quality of the review dataset for analysis.
  • Experimental results show SODCM achieves higher accuracy and recall compared to state-of-the-art anomaly detection algorithms.

Conclusions:

  • SODCM enhances sentiment analysis algorithm performance by addressing data inconsistencies without data loss.
  • The method is effective for datasets containing customer reviews of various products.
  • This approach offers a valuable tool for improving the reliability of sentiment analysis in e-commerce.