Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

UniqueProt: Creating representative protein sequence sets.

Sven Mika1, Burkhard Rost

  • 1CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA. mika@cubic.bioc.columbia.edu

Nucleic Acids Research
|June 26, 2003
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

On the state of protein function prediction: a report on the fourth CAFA challenge.

bioRxiv : the preprint server for biology·2026
Same author

Advances in Protein Function Prediction from the Fifth CAFA Challenge.

bioRxiv : the preprint server for biology·2026
Same author

Whole-genome prediction of bacterial pathogenic capacity on novel bacteria using protein language models with PathogenFinder2.

Bioinformatics (Oxford, England)·2026
Same author

Biocentral: Embedding-based Protein Predictions.

Journal of molecular biology·2026
Same author

Toxin data quality: a critical examination of bacterial exotoxins and animal toxins.

BMC research notes·2025
Same author

FlatProt: 2D visualization eases protein structure comparison.

BMC bioinformatics·2025
Same journal

Correction to 'New origin firing is inhibited by APC/CCdh1 activation in S-phase after severe replication stress'.

Nucleic acids research·2026
Same journal

VeloRM: disentangling pre- and post-splicing RNA modification dynamics at single-cell resolution.

Nucleic acids research·2026
Same journal

Accessibility of telomeric overhangs to stabilizing small-molecule ligands.

Nucleic acids research·2026
Same journal

Multivalent interactions mediate SNAIL transcription factor stimulation of the nucleosome deacetylase activity of the CoREST complex.

Nucleic acids research·2026
Same journal

Genome-wide mapping of DNA G-quadruplexes in Trypanosoma brucei chromatin reveals enrichment in coding regions and transcription start sites.

Nucleic acids research·2026
Same journal

Correction to 'The Gene Ontology knowledgebase in 2026'.

Nucleic acids research·2026
See all related articles

UniqueProt provides a fast, practical web service to generate unbiased protein sequence datasets using a greedy algorithm. This approach creates representative protein sets, addressing data bias effectively for various applications.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Structural Biology

Background:

  • Protein sequence datasets often contain biases that can affect downstream analyses.
  • Developing methods to create representative and unbiased datasets is crucial for reliable biological research.

Purpose of the Study:

  • To introduce UniqueProt, a web service for generating representative and unbiased protein sequence datasets.
  • To offer a practical and user-friendly solution for addressing data bias in protein sequence data.

Main Methods:

  • Utilizes a greedy algorithm to identify the largest possible representative sets.
  • Employs the HSSP-value to quantify and establish protein sequence similarity.
  • Provides both a web service and a downloadable command-line version for Linux.

Related Experiment Videos

Main Results:

  • UniqueProt successfully generates representative protein sequence datasets.
  • The service offers a fast and practical solution for mitigating bias in biological data.
  • The identified 'representatives' are not central to well-defined clusters, acknowledging the problem-specific nature of clustering.

Conclusions:

  • UniqueProt is an effective tool for creating unbiased protein sequence datasets.
  • The service enhances the reliability of analyses relying on protein sequence data.
  • Accessible online and via command-line, UniqueProt offers a versatile solution for researchers.