Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

5.0K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
5.0K
Outliers and Influential Points01:08

Outliers and Influential Points

6.8K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.8K
What Are Outliers?01:12

What Are Outliers?

5.7K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.7K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

8.5K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
8.5K
Modified Boxplots00:57

Modified Boxplots

11.8K
A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...
11.8K
Trimmed Mean01:10

Trimmed Mean

3.6K
While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...
3.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Differential Modulation of GLP-1R by Dietary Ginsenosides Points to a Putative Extracellular Allosteric Site.

International journal of molecular sciences·2026
Same author

Barriers to the Pharmacologic Rescue of W1282X CFTR.

Biochemistry·2025
Same author

Sirt6 prevents the age-related decline of H<sub>2</sub>S through the control of one-carbon metabolism.

Proceedings of the National Academy of Sciences of the United States of America·2025
Same author

RETRACTED: Naamneh et al. Structure-Activity Relationship of Synthetic Linear KTS-Peptides Containing Meta-Aminobenzoic Acid as Antagonists of α1β1 Integrin with Anti-Angiogenic and Melanoma Anti-Tumor Activities. <i>Pharmaceuticals</i> 2024, <i>17</i>, 549.

Pharmaceuticals (Basel, Switzerland)·2025
Same author

Machine Learning-Based Identification of Petroleum Distillates and Gasoline Traces Using Measured and Synthetic GC Spectra from Collected Samples.

Molecular informatics·2025
Same author

Multimodal Inhibition of <i>Pectobacterium brasiliense</i> Virulence by the Citrus Flavanone Naringenin.

Journal of agricultural and food chemistry·2025
Same journal

The Anionic States of Ubiquinone Characterized by Second-Order Approximate Coupled-Cluster Theory.

Journal of computational chemistry·2026
Same journal

Hydrogen Bond Energy Estimation in Large Molecular Clusters via the Method of Synergistic Cyclic Cooperativity: A Software Update H-BEE 2.0.

Journal of computational chemistry·2026
Same journal

The Intricate Mechanism of Nitric Oxide Synthase.

Journal of computational chemistry·2026
Same journal

A Molecular "Thermometer" for Measuring Effective Non-Local Exchange.

Journal of computational chemistry·2026
Same journal

Insights to Orientation Dependence of Molecular Conduction Modeled by High-Level Quantum Embedding.

Journal of computational chemistry·2026
Same journal

AutoSTOP-RT-TDDFT: Adaptive and Selected Real-Time Time-Dependent Density Functional Theory for Simulation of X-Ray Absorptions.

Journal of computational chemistry·2026
See all related articles

Related Experiment Video

Updated: Apr 19, 2026

Author Spotlight: UAV Remote Sensing for Efficient Invasive Plant Biomass Estimation
08:47

Author Spotlight: UAV Remote Sensing for Efficient Invasive Plant Biomass Estimation

Published on: February 9, 2024

2.2K

k-Nearest neighbors optimization-based outlier removal.

Abraham Yosipof1, Hanoch Senderowitz

  • 1Department of Chemistry, Bar Ilan University, Ramat-Gan, 52900, Israel.

Journal of Computational Chemistry
|December 16, 2014
PubMed
Summary
This summary is machine-generated.

A new k-nearest neighbors algorithm effectively identifies and removes molecular compound outliers. This method preserves dataset diversity and improves quantitative structure-activity relationship (QSAR) model prediction statistics.

Keywords:
distance-based methodk-nearest neighborsoptimizationoutlier detectionoutlier removalquantitative structure activity relationship

More Related Videos

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.5K

Related Experiment Videos

Last Updated: Apr 19, 2026

Author Spotlight: UAV Remote Sensing for Efficient Invasive Plant Biomass Estimation
08:47

Author Spotlight: UAV Remote Sensing for Efficient Invasive Plant Biomass Estimation

Published on: February 9, 2024

2.2K
Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations
12:27

Large-scale Reconstructions and Independent, Unbiased Clustering Based on Morphological Metrics to Classify Neurons in Selective Populations

Published on: February 15, 2017

7.5K

Area of Science:

  • Computational chemistry
  • Cheminformatics
  • Data science

Background:

  • Molecular compound datasets frequently contain outliers that can negatively impact data interpretation and model generation.
  • Effective outlier removal is crucial for reliable data analysis and predictive modeling in cheminformatics.

Purpose of the Study:

  • To introduce a novel iterative method for identifying and removing outliers using a k-nearest neighbors (KNN) optimization algorithm.
  • To evaluate the performance of the new outlier removal method against existing techniques and random removal.

Main Methods:

  • An iterative outlier identification and removal process based on a KNN optimization algorithm was developed.
  • The algorithm was tested on three distinct molecular compound datasets.

Main Results:

  • The KNN-based method produced filtered datasets that better maintained parent dataset diversity compared to four alternative methods and random removal.
  • Quantitative structure-activity relationship (QSAR) models built using data processed by the new algorithm exhibited significantly improved prediction statistics.

Conclusions:

  • The developed KNN optimization algorithm offers a superior approach for outlier removal in molecular datasets.
  • This method is highly suitable for the pretreatment of datasets prior to quantitative structure-activity relationship (QSAR) modeling, enhancing model accuracy and reliability.