Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Protein classification based on text document classification techniques.

Betty Yee Man Cheng1, Jaime G Carbonell, Judith Klein-Seetharaman

  • 1Language Technologies Institute, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, Pennsylvania, USA.

Proteins
|January 13, 2005
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Characterization of Recombinant GMPR from <i>Pocillopora damicornis</i> and Potential Mechanisms of Cold-Induced Metabolic Adaptation.

Biology·2026
Same author

Comprehensive and quantitative molecular docking analysis of rhodopsin-retinal interactions.

Biophysical journal·2026
Same author

Author Correction: 7-Dehydrocholesterol is an endogenous suppressor of ferroptosis.

Nature·2026
Same author

Three Unrelated Children With Childhood Apraxia of Speech: Exome Sequencing and Functional Gene Analysis Imply a Role of Laminin-511 in Early Neurodevelopment.

Case reports in genetics·2026
Same author

Allostery-Driven Substrate Gating in the Chlorothalonil Dehalogenase from <i>Pseudomonas</i> sp. CTN-3.

Biology·2026
Same author

Speech and Language Development of Two Brothers With Bainbridge-Ropers Syndrome: Phenotypic and Bioinformatic Support for a Cerebellar ASXL3 Hypothesis.

American journal of medical genetics. Part A·2025
Same journal

Engineered HSP90-MP65 Bivalent Fusion Antigen: A Novel Vaccine Candidate Against Invasive Candidiasis.

Proteins·2026
Same journal

Physics-Based Energy Functions for Computational Protein Design.

Proteins·2026
Same journal

Impact of Stabilizing Osmolytes on the Conformational Dynamics of Human and Rat Islet Amyloid Polypeptides.

Proteins·2026
Same journal

Stabilization of Bone Morphogenetic Protein-2 at Physiological pH: Contrasting Roles of CHAPS and Arginine in Aggregation Inhibition.

Proteins·2026
Same journal

Structural Insights Into the Function of Leishmania major Adenylosuccinate Lyase.

Proteins·2026
Same journal

Generalizing the Gaussian Network Model: Spanning-Tree Thermodynamics Shows Entropy-Driven KRAS Activation.

Proteins·2026
See all related articles

New methods using Naive Bayes classifiers and n-grams achieve higher accuracy for classifying G-protein coupled receptors (GPCRs) and other protein families compared to existing techniques.

Area of Science:

  • Computational Biology
  • Bioinformatics
  • Protein Science

Background:

  • Automated protein classification is crucial due to rapid biotechnological advancements and the discovery of new proteins.
  • G-protein coupled receptors (GPCRs) present a classification challenge due to their extensive diversity.
  • Previous studies indicated the need for complex classifiers like Support Vector Machines (SVMs) for high accuracy.

Purpose of the Study:

  • To develop and evaluate novel, accurate, and automated protein classification methods.
  • To apply n-gram counts and feature selection with simpler classifiers (Naive Bayes, Decision Tree) to protein classification.
  • To compare the performance of these new methods against established techniques like SVM and Hidden Markov Models (HMMs).

Main Methods:

Related Experiment Videos

  • Utilized Naive Bayes and Decision Tree classifiers with chi-square feature selection on n-gram counts (peptide sequences).
  • Applied the methodology to the G-protein coupled receptor (GPCR) dataset and evaluation protocol from prior studies.
  • Validated the approach on the nuclear receptor superfamily to assess generalizability.
  • Main Results:

    • Naive Bayes achieved 93.0% (Level I) and 92.4% (Level II) accuracy for GPCR subfamily classification, outperforming SVM (88.4% and 86.3%).
    • This represents a 39.7% and 44.5% reduction in residual error for Level I and Level II GPCR classification, respectively.
    • The method demonstrated strong performance on nuclear receptors (up to 97.8% accuracy) and comparable results to PFAM searches for known families.

    Conclusions:

    • Naive Bayes classifiers with n-gram features offer a more accurate and potentially simpler alternative for protein classification, especially for diverse superfamilies like GPCRs.
    • The developed method shows significant improvements over existing techniques, reducing classification errors.
    • The approach is generalizable to other protein families, demonstrating its broad applicability in bioinformatics.