Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.

E Kretschmann1, W Fleischmann, R Apweiler

  • 1The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. kretsch@ebi.ac.uk

Bioinformatics (Oxford, England)
|October 24, 2001
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Vacuum sealing: indication, technique, and results.

European journal of orthopaedic surgery & traumatology : orthopedie traumatologie·2013
Same author

Gene Ontology annotations and resources.

Nucleic acids research·2012
Same author

Aprotinin and classic wound drainage are unnecessary in total hip replacement - a prospective randomized trial.

European journal of medical research·2011
Same author

The influence of various and rogenic steroids on nitrogen balance and growth.

The Journal of clinical endocrinology and metabolism·2010
Same author

Effects of thyroid on creatine metabolism with a discussion of the mechanism of storage and excretion of creatine bodies.

The Journal of clinical investigation·2010
Same author

Effect of thyroxin on estrogen-induced changes in fowl.

Federation proceedings·2010
Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026
Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026
Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026
Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026
Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026
Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026
See all related articles

Automated data mining generates over 11,000 rules to improve protein functional annotation in SWISS-PROT. This approach can annotate 33% of protein keywords with low error, aiding researchers in understanding protein function.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Protein Science

Background:

  • The increasing volume of protein sequence data outpaces manual functional annotation efforts.
  • Existing automated annotation systems offer limited information, creating a need for enhanced tools.
  • Detecting inconsistencies in manual annotations requires automated support.

Purpose of the Study:

  • To develop and apply automated data mining techniques for improving protein functional annotation.
  • To generate reliable annotation rules for protein sequences.
  • To support manual annotation processes and enhance data quality in public databases.

Main Methods:

  • A standard data mining algorithm was employed to extract knowledge from SWISS-PROT protein annotations.

Related Experiment Videos

  • 11,306 annotation rules were generated based on organism taxonomy and sequence signature matches.
  • A web-accessible database was created to store and apply these generated rules.
  • Main Results:

    • The data mining approach successfully generated a comprehensive set of 11,306 annotation rules.
    • Applying these rules can automatically generate 33% of keyword annotations for unannotated proteins with a 1.5% error rate.
    • Annotation coverage can be extended to 60% by accepting a 5% error rate.

    Conclusions:

    • Automated data mining is effective in generating reliable protein functional annotations.
    • The developed rules significantly enhance the annotation process for protein sequences.
    • This method provides a valuable tool for researchers to improve protein data quality and accessibility.