Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Semi-supervised protein classification using cluster kernels.

Jason Weston¹, Christina Leslie, Eugene Ie

¹NEC Research Institute, 4 Independence Way, Princeton, NJ 08540, USA. jasonw@nec-labs.com

Bioinformatics (Oxford, England)

|May 21, 2005

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Author Correction: Ontogeny and transcriptional regulation of Thetis cells.

Nature·2026

Same author

Prioritizing peptides for targeted mass spectrometry experiments using deep learning.

bioRxiv : the preprint server for biology·2026

Same author

Embryo-scale Visual Cell Sorting reveals a conserved transcriptomic signature of nucleolar size linked to proteostasis.

bioRxiv : the preprint server for biology·2026

Same author

Prediction and functional interpretation of inter-chromosomal genome architecture from DNA sequence with TwinC.

Nature communications·2026

Same author

Postmitotic transcription and 3D regulation show locus-specific and differentiation-specific sensitivity to cohesin depletion.

Nature genetics·2026

Same author

Benchmarking Hi-C scaffolders using reference genomes and de novo assemblies.

Genome biology·2026

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026

Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

See all related articles

This study introduces cluster kernel techniques to enhance protein sequence representation using unlabeled data. These methods improve protein classification accuracy and computational efficiency compared to existing approaches.

Area of Science:

Bioinformatics
Computational Biology
Structural Bioinformatics

Background:

Accurate protein classification relies on effective amino acid sequence representation.
String kernels achieve state-of-the-art performance but primarily use labeled data.
Unlabeled protein sequence data is significantly more abundant than labeled data.

Purpose of the Study:

To develop scalable cluster kernel techniques for incorporating unlabeled data into protein sequence representation.
To improve the classification performance of existing string kernel methods.
To offer a computationally efficient alternative for utilizing unlabeled protein data.

Main Methods:

Development of novel cluster kernel techniques.
Integration of unlabeled protein sequence data into sequence representation.

Related Experiment Videos

Comparative analysis against standard methods and existing cluster kernel approaches.

Main Results:

Demonstrated significant improvement in protein classification performance.
Outperformed standard methods for utilizing unlabeled data, including adding close homologs.
Achieved performance equal to or superior to previous cluster kernel methods with enhanced computational efficiency.

Conclusions:

Cluster kernel techniques effectively leverage unlabeled data for improved protein sequence representation and classification.
The proposed methods offer a scalable and computationally efficient solution for protein classification.
This work advances the utilization of large unlabeled biological datasets in machine learning applications.