Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

RSDB: representative protein sequence databases have high information content.

J Park1, L Holm, A Heger

  • 1The European Bioinformatics Institute, EMBL Outstation, Cambridge CB10 1SD, UK. jong@biosophy.org

Bioinformatics (Oxford, England)
|June 28, 2000
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Crust Composition and the Shallow Heat Source in KS 1731-260.

Physical review letters·2025
Same author

Advancement of Photospheric Radius Expansion and Clocked Type-I X-Ray Burst Models with the New ^{22}Mg(α,p)^{25}Al Reaction Rate Determined at the Gamow Energy.

Physical review letters·2021
Same author

New ^{59}Fe Stellar Decay Rate with Implications for the ^{60}Fe Radioactivity in Massive Stars.

Physical review letters·2021
Same author

Fatal attack on a pedestrian by an escaped circus elephant.

Forensic science international·2019
Same author

Thrombin generation potential and clot-forming capacity of thawed fresh-frozen plasma, plasma frozen within 24 h and solvent/detergent-treated plasma (octaplasLG<sup>®</sup> ), during 5-day storage at 1-6°C.

Vox sanguinis·2018
Same author

Comparative biochemical studies of fresh frozen plasma and pooled solvent/detergent-treated plasma (octaplasLG<sup>®</sup> ) with focus on protein S and its impact in different thrombin generation assay set-ups.

Vox sanguinis·2016
Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026
Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026
Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026
Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026
Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
See all related articles

Reducing biological sequence databases to 50% sequence identity retains homology information. Representative sequence databases (RSDB) of 50% identity are one-third the size, enabling faster homology searching.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Biological sequence databases suffer from high redundancy due to multiple data sources and natural sequence similarity from gene duplication.
  • Assessing the impact of redundancy reduction on homology information is crucial for efficient data management.

Purpose of the Study:

  • To determine the threshold of sequence identity reduction without losing critical homology information.
  • To evaluate if a reduced database can maintain the same biological information content as a full database.

Main Methods:

  • Generation of nine representative sequence databases (RSDB) with varying levels of sequence identity reduction.
  • Comparative analysis of information content and homology searching effectiveness between full and reduced databases.

Related Experiment Videos

Main Results:

  • Information content in sequence databases is not directly proportional to size.
  • A representative sequence database reduced to 50% mutual sequence identity (RSDB50) proved equivalent to the full database for homology searching.
  • RSDB50 achieved this equivalence at one-third the size of the full database, enabling six times faster iterative profile searching.

Conclusions:

  • Significant redundancy can be removed from biological sequence databases without compromising homology information.
  • Reduced sequence databases offer a more efficient approach to homology searching, balancing data size and information retrieval effectiveness.