Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Quantifying and Rejecting Outliers: The Grubbs Test01:02

Quantifying and Rejecting Outliers: The Grubbs Test

4.6K
Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
4.6K
What Are Outliers?01:12

What Are Outliers?

5.6K
Outliers are observed data points that are far from the least squares line. They have unusual values and need to be examined carefully. Though an outlier may result from erroneous data, at other times, it may hold valuable information about the population under study and should be included in the data. Hence, it is crucial to examine what causes a data point to be an outlier.
The z score is used to find outliers or unusual values. It should be noted that any values beyond -2 and +2 are...
5.6K
Detection of Gross Error: The Q Test01:00

Detection of Gross Error: The Q Test

7.4K
When one or more data points appear far from the rest of the data, there is a need to determine whether they are outliers and whether they should be eliminated from the data set to ensure an accurate representation of the measured value. In many cases, outliers arise from gross errors (or human errors) and do not accurately reflect the underlying phenomenon. In some cases, however, these apparent outliers reflect true phenomenological differences. In these cases, we can use statistical methods...
7.4K
Outliers and Influential Points01:08

Outliers and Influential Points

6.7K
An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500), while others may indicate that something unusual is happening. Outliers are present far from the least squares line in the vertical direction. They have large "errors," where the "error" or residual is the...
6.7K
Multi-species Conserved Sequences02:51

Multi-species Conserved Sequences

4.9K
Next-generation sequencing technologies have created large genomic databases of a variety of animals and plants. Ever since the human genome project was completed, scientists studied the genome of primates, mammals, and other phylogenetically distant living beings. Such large-scale  studies have provided new insights into the evolutionary relationship between organisms.
Although the genome of each species varies greatly from each other, a few sequences are highly conserved. Such conserved...
4.9K
Protein Folding Quality Check in the RER01:29

Protein Folding Quality Check in the RER

5.7K
ER is the primary site for the maturation and folding of soluble and transmembrane secretory proteins. The calnexin cycle is a specific chaperone system that folds and assesses the confirmation of N-glycosylated proteins before they can exit the ER lumen. The primary players of this quality check pipeline are the lectins, ER-resident chaperones, and a glucosyl transferase enzyme. In case the calnexin system in the lumen fails to salvage a misfolded protein, it is transported to the cytoplasm...
5.7K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Transcriptional profiling of early differentiation of primary human mesenchymal stem cells into chondrocytes.

Scientific data·2023
Same author

The Clustal Omega Multiple Alignment Package.

Methods in molecular biology (Clifton, N.J.)·2020
Same author

QuanTest2: benchmarking multiple sequence alignments using secondary structure prediction.

Bioinformatics (Oxford, England)·2019
Same author

TPP riboswitch-dependent regulation of an ancient thiamin transporter in Candida.

PLoS genetics·2018
Same author

The Birth and Death of Olfactory Receptor Gene Families in Mammalian Niche Adaptation.

Molecular biology and evolution·2018
Same author

Identification of fungi in shotgun metagenomics datasets.

PloS one·2018
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

Related Experiment Video

Updated: Apr 5, 2026

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation
16:02

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Published on: February 10, 2023

3.4K

OD-seq: outlier detection in multiple sequence alignments.

Peter Jehl1, Fabian Sievers2, Desmond G Higgins3

  • 1UCD Conway Institute of Biomolecular and Biomedical Sciences, University College Dublin, Dublin, 4, Ireland. peter.jehl@ucdconnect.ie.

BMC Bioinformatics
|August 26, 2015
PubMed
Summary
This summary is machine-generated.

Outlier sequences can compromise multiple sequence alignment (MSA) accuracy. OD-seq offers a fast, automated method to detect and remove these outliers, improving downstream sequence analysis reliability.

More Related Videos

An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

4.1K
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.8K

Related Experiment Videos

Last Updated: Apr 5, 2026

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation
16:02

Demonstration of the Sequence Alignment to Predict Across Species Susceptibility Tool for Rapid Assessment of Protein Conservation

Published on: February 10, 2023

3.4K
An Integrated Approach for Microprotein Identification and Sequence Analysis
09:37

An Integrated Approach for Microprotein Identification and Sequence Analysis

Published on: July 12, 2022

4.1K
Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues
07:08

Optimization of Synthetic Proteins: Identification of Interpositional Dependencies Indicating Structurally and/or Functionally Linked Residues

Published on: July 14, 2015

7.8K

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Sequence Analysis

Background:

  • Multiple sequence alignments (MSAs) are fundamental in sequence analysis.
  • Outlier sequences can negatively impact MSA accuracy and downstream analyses.
  • Automated outlier detection is crucial for reliable sequence data interpretation.

Purpose of the Study:

  • To introduce OD-seq, a novel method and software for automatic outlier detection in MSAs.
  • To provide a computationally efficient solution for handling large sequence datasets.

Main Methods:

  • OD-seq identifies outliers by calculating the average distance of each sequence to all others.
  • Anomalous distances are detected using the interquartile range or bootstrapping.
  • The mBed algorithm from Clustal Omega reduces computational complexity to O(N log(N)).

Main Results:

  • OD-seq effectively detects outlier sequences in MSAs with high sensitivity and specificity.
  • The software processes large alignments rapidly, analyzing thousands of sequences in seconds.
  • Accuracy is reduced for sets of unaligned sequences compared to MSAs.

Conclusions:

  • OD-seq provides a practical and efficient solution for identifying outliers in MSAs.
  • The method enhances the reliability of sequence analysis by ensuring data quality.
  • Software is publicly available for use in bioinformatics research.