Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test

Quantifying and Rejecting Outliers: The Grubbs Test

Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...

What Are Outliers?

What Are Outliers?

Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...

Detection of Gross Error: The Q Test

Detection of Gross Error: The Q Test

When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...

Outliers and Influential Points

Outliers and Influential Points

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...

Multi-species Conserved Sequences

Multi-species Conserved Sequences

Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...

Protein Folding Quality Check in the RER

Protein Folding Quality Check in the RER

ER is the primary site for the maturation and folding of soluble and transmembrane secretory proteins. The calnexin cycle is a specific chaperone system that folds and assesses the confirmation of N-glycosylated proteins before they can exit the ER lumen. The primary players of this quality check pipeline are the lectins, ER-resident chaperones, and a glucosyl transferase enzyme. In case the calnexin system in the lumen fails to salvage a misfolded protein, it is transported to the cytoplasm...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Transcriptional profiling of early differentiation of primary human mesenchymal stem cells into chondrocytes.

Scientific data·2023

Same author

The Clustal Omega Multiple Alignment Package.

Methods in molecular biology (Clifton, N.J.)·2020

Same author

QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction.

Bioinformatics (Oxford, England)·2019

Same author

TPP riboswitch-dependent regulation of an ancient thiamin transporter in Candida.

PLoS genetics·2018

Same author

The Birth and Death of Olfactory Receptor Gene Families in Mammalian Niche Adaptation.

Molecular biology and evolution·2018

Same author

Identification of fungi in shotgun metagenomics datasets.

PloS one·2018

Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026

Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026

Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026

Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026

Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026

Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Apr 5, 2026

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Published on: February 10, 2023

OD-seq: outlier detection in multiple sequence alignments.

Peter Jehl¹, Fabian Sievers², Desmond G Higgins³

¹UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, 4, Ireland. peter.jehl@ucdconnect.ie.

BMC Bioinformatics

|August 26, 2015

Summary

This summary is machine-generated.

Outlier sequences can compromise multiple sequence alignment (MSA) accuracy. OD-seq offers a fast, automated method to detect and remove these outliers, improving downstream sequence analysis reliability.

More Related Videos

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

Related Experiment Videos

Last Updated: Apr 5, 2026

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Published on: February 10, 2023

An Integrated Approach for Microprotein Identification and Sequence Analysis

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

Area of Science:

Bioinformatics
Computational Biology
Sequence Analysis

Background:

Multiple sequence alignments (MSAs) are fundamental in sequence analysis.
Outlier sequences can negatively impact MSA accuracy and downstream analyses.
Automated outlier detection is crucial for reliable sequence data interpretation.

Purpose of the Study:

To introduce OD-seq, a novel method and software for automatic outlier detection in MSAs.
To provide a computationally efficient solution for handling large sequence datasets.

Main Methods:

OD-seq identifies outliers by calculating the average distance of each sequence to all others.
Anomalous distances are detected using the interquartile range or bootstrapping.
The mBed algorithm from Clustal Omega reduces computational complexity to O(N log(N)).

Main Results:

OD-seq effectively detects outlier sequences in MSAs with high sensitivity and specificity.
The software processes large alignments rapidly, analyzing thousands of sequences in seconds.
Accuracy is reduced for sets of unaligned sequences compared to MSAs.

Conclusions:

OD-seq provides a practical and efficient solution for identifying outliers in MSAs.
The method enhances the reliability of sequence analysis by ensuring data quality.
Software is publicly available for use in bioinformatics research.