Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Feature selection and the class imbalance problem in predicting protein function from sequence.

Ali Al-Shahib1, Rainer Breitling, David Gilbert

  • 1Bioinformatics Research Centre, Department of Computing Science, University of Glasgow, Glasgow, UK. alshahib@dcs.gla.ac.uk

Applied Bioinformatics
|October 20, 2005
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Graph neural networks can predict ketosynthase substrate specificity.

Scientific reports·2026
Same author

Breast Cancer in Transgender Patients Following Medical and Surgical Intervention.

Clinical breast cancer·2026
Same author

Exploring the biotechnological potential of terrestrial hot spring microbiomes for CO<sub>2</sub> utilisation.

Environmental microbiome·2026
Same author

Beyond nature: in vivo production of natural product analogues through programmed biosynthetic pathways.

Current opinion in biotechnology·2026
Same author

Synteny plot quality control with SyntenyQC.

Bioinformatics (Oxford, England)·2025
Same author

Engineering microbiomes for natural product discovery and production.

Natural product reports·2025
Same journal

Statistically consistent identification of differentially expressed genes in DNA chip data over the whole expression range: relative variance method.

Applied bioinformatics·2006
Same journal

A nonparametric likelihood ratio test to identify differentially expressed genes from microarray data.

Applied bioinformatics·2006
Same journal

Simulation study of ratio calculation formulae of two-colour cDNA microarray data.

Applied bioinformatics·2006
Same journal

Alternative mRNA polyadenylation can potentially affect detection of gene expression by affymetrix genechip arrays.

Applied bioinformatics·2006
Same journal

Comparisons of annotation predictions for affymetrix GeneChips.

Applied bioinformatics·2006
Same journal

Ontology annotation treebrowser : an interactive tool where the complementarity of medical subject headings and gene ontology improves the interpretation of gene lists.

Applied bioinformatics·2006
See all related articles

Machine learning models for protein function prediction achieve better accuracy using feature selection and data balancing techniques. This approach enhances protein sequence analysis and outperforms standard methods.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Machine Learning

Background:

  • Predicting protein function from amino acid sequence is crucial when homology-based methods fail.
  • Machine learning offers an alternative by analyzing sequence features directly.
  • Challenges include identifying relevant features and handling imbalanced training datasets.

Purpose of the Study:

  • To improve machine learning models for protein function prediction directly from amino acid sequences.
  • To address feature selection and data imbalance issues in model training.

Main Methods:

  • Applied feature subset selection to identify discriminatory sequence features.
  • Utilized undersampling of the majority class to balance the training data.
  • Developed and compared support vector machine (SVM) classifiers with other algorithms.

Related Experiment Videos

Main Results:

  • Feature selection and undersampling significantly improved SVM classifier performance.
  • Balanced datasets through undersampling enhanced predictive accuracy.
  • The combined approach (SVMs with feature selection and undersampling) outperformed other learning algorithms.

Conclusions:

  • The combined approach of feature selection and undersampling generates powerful machine learning classifiers for protein function prediction.
  • Selected features may offer insights into the sequence-function relationship.
  • This method provides a robust alternative for predicting protein function directly from sequence data.