Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases

G Grillo1, M Attimonelli, S Liuni

  • 1Centro di Studio sui Mitocondri e Metabolismo Energetico, CNR, Italy.

Computer Applications in the Biosciences : CABIOS
|February 1, 1996
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Human neural stem cells derived from fetal human brain communicate with each other and rescue ischemic neuronal cells through tunneling nanotubes.

Cell death & disease·2024
Same author

[Translated article] Study of femoral component malrotation as a cause of pain after total knee arthroplasty.

Revista espanola de cirugia ortopedica y traumatologia·2024
Same author

Study of femoral component malrotation as a cause of pain after total knee arthroplasty.

Revista espanola de cirugia ortopedica y traumatologia·2023
Same author

Farnesoid X receptor activation by the novel agonist TC-100 (3α, 7α, 11β-Trihydroxy-6α-ethyl-5β-cholan-24-oic Acid) preserves the intestinal barrier integrity and promotes intestinal microbial reshaping in a mouse model of obstructed bile acid flow.

Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie·2022
Same author

Gene electrotransfer of IL-2 and IL-12 plasmids effectively eradicated murine B16.F10 melanoma.

Bioelectrochemistry (Amsterdam, Netherlands)·2021
Same author

Nilotinib in steroid-refractory cGVHD: prospective parallel evaluation of response, according to NIH criteria and exploratory response criteria (GITMO criteria).

Bone marrow transplantation·2020
Same journal

DCA: an efficient implementation of the divide-and-conquer approach to simultaneous multiple sequence alignment.

Computer applications in the biosciences : CABIOS·1998
Same journal

Two applications to facilitate the viewing of database search result files on the Macintosh.

Computer applications in the biosciences : CABIOS·1998
Same journal

BioWish: a molecular biology command extension to Tcl/Tk.

Computer applications in the biosciences : CABIOS·1998
Same journal

The Sequence Alerting Server--a new WEB server.

Computer applications in the biosciences : CABIOS·1998
Same journal

A software tool for the analysis of mass spectrometric disulfide mapping experiments.

Computer applications in the biosciences : CABIOS·1998
Same journal

SAMBA: hardware accelerator for biological sequence comparison.

Computer applications in the biosciences : CABIOS·1998
See all related articles

Redundant sequence data can bias analysis. This study introduces a new algorithm to identify and remove similar nucleotide sequences, enabling more accurate statistical analysis and faster database searches.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Publicly available nucleotide sequence databases contain numerous redundant entries, leading to biased statistical analyses and inefficient database searching.
  • Redundancy in sequence data can result in erroneously high significance assigned to non-significant patterns.
  • Unbiased statistical analysis and efficient database searching necessitate the removal of redundant sequences.

Purpose of the Study:

  • To develop a novel algorithm for identifying and removing redundancy in nucleotide sequence collections.
  • To enable the generation of non-redundant sequence datasets for improved biological data analysis.
  • To provide a quantitative method for assessing sequence redundancy based on similarity thresholds.

Main Methods:

Related Experiment Videos

  • The study employs an 'approximate string matching' algorithm to quantify sequence similarity.
  • The algorithm calculates the degree of similarity and overlap between all sequence pairs within a database.
  • A user-defined similarity threshold determines which sequences are classified as redundant.
  • Main Results:

    • The developed algorithm effectively determines the overall similarity between sequence pairs in a nucleotide database.
    • The procedure automatically generates nucleotide sequence collections that are free from redundancy.
    • This facilitates the creation of cleaner datasets for downstream analyses.

    Conclusions:

    • The new algorithm provides an effective solution for managing redundancy in nucleotide sequence databases.
    • By purging redundant sequences, the algorithm supports more accurate statistical analyses and accelerates database searches.
    • This approach is crucial for advancing bioinformatics and computational biology research through reliable data handling.