Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Automatic rule generation for protein annotation with the C4.5 data mining algorithm applied on SWISS-PROT.

E Kretschmann¹, W Fleischmann, R Apweiler

¹The EMBL Outstation, The European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK. kretsch@ebi.ac.uk

Bioinformatics (Oxford, England)

|October 24, 2001

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Vacuum sealing: indication, technique, and results.

European journal of orthopaedic surgery & traumatology : orthopedie traumatologie·2013

Same author

Gene Ontology annotations and resources.

Nucleic acids research·2012

Same author

Aprotinin and classic wound drainage are unnecessary in total hip replacement - a prospective randomized trial.

European journal of medical research·2011

Same author

The influence of various and rogenic steroids on nitrogen balance and growth.

The Journal of clinical endocrinology and metabolism·2010

Same author

Effects of thyroid on creatine metabolism with a discussion of the mechanism of storage and excretion of creatine bodies.

The Journal of clinical investigation·2010

Same author

Effect of thyroxin on estrogen-induced changes in fowl.

Federation proceedings·2010

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026

Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

See all related articles

Automated data mining generates over 11,000 rules to improve protein functional annotation in SWISS-PROT. This approach can annotate 33% of protein keywords with low error, aiding researchers in understanding protein function.

Area of Science:

Bioinformatics
Computational Biology
Protein Science

Background:

The increasing volume of protein sequence data outpaces manual functional annotation efforts.
Existing automated annotation systems offer limited information, creating a need for enhanced tools.
Detecting inconsistencies in manual annotations requires automated support.

Purpose of the Study:

To develop and apply automated data mining techniques for improving protein functional annotation.
To generate reliable annotation rules for protein sequences.
To support manual annotation processes and enhance data quality in public databases.

Main Methods:

A standard data mining algorithm was employed to extract knowledge from SWISS-PROT protein annotations.

Related Experiment Videos

11,306 annotation rules were generated based on organism taxonomy and sequence signature matches.

A web-accessible database was created to store and apply these generated rules.

Main Results:

The data mining approach successfully generated a comprehensive set of 11,306 annotation rules.
Applying these rules can automatically generate 33% of keyword annotations for unannotated proteins with a 1.5% error rate.
Annotation coverage can be extended to 60% by accepting a 5% error rate.

Conclusions:

Automated data mining is effective in generating reliable protein functional annotations.
The developed rules significantly enhance the annotation process for protein sequences.
This method provides a valuable tool for researchers to improve protein data quality and accessibility.