Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the vertical...

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This number is...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Modified Boxplots

Modified Boxplots

A standard box and whisker plot informs us about the spread of the data in a given sample. One can identify the minimum value, maximum value, first quartile value, second quartile or median value, and third quartile.
However, the box plot does not tell the reader about outliers - values that lie far from the center of the data. We can modify the standard box and whisker plot to identify the outliers and visualize the actual spread of the data in a sample.
Initially, we calculate the adjusted...

Trimmed Mean

Trimmed Mean

While measuring the mean of a data set, care needs to be taken when associating the mean to its central tendency. The same goes for the arithmetic mean, the geometric mean, or the harmonic mean. This is because the presence of a single outlier data value can significantly affect the mean. That is, the mean is sensitive to fluctuations in the data set.
Although certain measures of central tendency are not sensitive to outliers, there are alternative versions of the mean that get around the...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

The Local Anesthetic Bupivacaine Inhibits the Progression of Non-Small Cell Lung Cancer by Inducing Autophagy Through Akt/mTOR Signaling.

Frontiers in oncology·2021

Same author

Light exposure mediates circadian rhythms of rhizosphere microbial communities.

The ISME journal·2021

Same author

Novel Potent Selective Orally Active S1P5 Receptor Antagonists.

ACS medicinal chemistry letters·2021

Same author

Biodegradable metals for bone fracture repair in animal models: a systematic review.

Regenerative biomaterials·2021

Same author

Rectangular multilayer dielectric gratings with broadband high diffraction efficiency and enhanced laser damage resistance.

Optics express·2021

Same author

Invariant Image Representation Using Novel Fractional-Order Polar Harmonic Fourier Moments.

Sensors (Basel, Switzerland)·2021

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jun 4, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Closest string with outliers.

Christina Boucher¹, Bin Ma

¹David R Cheriton School of Computer Science, University of Waterloo, Waterloo, ON. cabouche@cs.uwaterloo.ca

BMC Bioinformatics

|February 24, 2011

Summary

This summary is machine-generated.

The closest string with outliers (CSWO) problem refines pattern finding by allowing for a specified number of outlier strings. This approach identifies a central pattern and potential outliers in datasets, crucial for bioinformatics applications.

More Related Videos

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Related Experiment Videos

Last Updated: Jun 4, 2026

Competitive Genomic Screens of Barcoded Yeast Libraries

Competitive Genomic Screens of Barcoded Yeast Libraries

Published on: August 11, 2011

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Introductory Analysis and Validation of CUT&RUN Sequencing Data

Published on: December 13, 2024

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Detection of Rare Genomic Variants from Pooled Sequencing Using SPLINTER

Published on: June 23, 2012

Area of Science:

Computational biology
Bioinformatics algorithms
Stringology

Background:

The standard closest string problem seeks a center string within a fixed Hamming distance (d) of all input strings.
This model is sensitive to outliers, which can significantly impact results.
Identifying common patterns in datasets with potential noise is a key challenge.

Purpose of the Study:

To introduce and analyze the closest string with outliers (CSWO) problem.
To develop algorithms for finding a representative string and identifying outliers in datasets.
To extend existing models for pattern discovery in biological sequences.

Main Methods:

Formalized the closest string with outliers (CSWO) problem, allowing for k outliers.
Developed fixed-parameter tractable algorithms for CSWO with respect to parameters d and k.
Analyzed the computational complexity for both bounded and unbounded alphabets.

Main Results:

The CSWO model successfully identifies a center string within distance d of at least n-k input strings.
Algorithms were provided for CSWO, demonstrating its computability.
The problem was shown to be W[1]-hard for unbounded alphabets concerning n-k, ℓ, and d.

Conclusions:

The CSWO model provides a robust method for finding common patterns in datasets containing outliers.
The study initiates the investigation into the computability and parameter sensitivity of CSWO.
Further research is suggested on open problems related to this refined model.