Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Support vector machine learning from heterogeneous data: an empirical analysis using protein sequence and structure.

Darrin P Lewis¹, Tony Jebara, William Stafford Noble

¹Department of Computer Science, Columbia University, New York, NY, 10027.

Bioinformatics (Oxford, England)

|September 13, 2006

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Prioritizing peptides for targeted mass spectrometry experiments using deep learning.

bioRxiv : the preprint server for biology·2026

Same author

Embryo-scale Visual Cell Sorting reveals a conserved transcriptomic signature of nucleolar size linked to proteostasis.

bioRxiv : the preprint server for biology·2026

Same author

Prediction and functional interpretation of inter-chromosomal genome architecture from DNA sequence with TwinC.

Nature communications·2026

Same author

Benchmarking Hi-C scaffolders using reference genomes and de novo assemblies.

Genome biology·2026

Same author

Unified imputation of missing data modalities and features in multi-omic data via shared representation learning.

bioRxiv : the preprint server for biology·2026

Same author

Improvements to Casanovo, a Deep Learning <i>De Novo</i> Peptide Sequencer.

Journal of proteome research·2025

Same journal

MCFST: Spatial domain identification method based on multi-view graph convolutional network and graph fusion network.

Bioinformatics (Oxford, England)·2026

Same journal

SpaBiT: Enhancing Spatial Transcriptomics Resolution via Bidirectional Attention Transformers.

Bioinformatics (Oxford, England)·2026

Same journal

EDEL: Enhancing Dense Retrievers for Curation of Biomedical Knowledge Bases.

Bioinformatics (Oxford, England)·2026

Same journal

Informative Relational Learning for Adverse Reaction Prediction with Enhanced Generalization to Novel Drugs.

Bioinformatics (Oxford, England)·2026

Same journal

An interpretable deep learning framework uncovers features governing CRISPR-Cas9 genome-editing efficiency.

Bioinformatics (Oxford, England)·2026

Same journal

3DICE: Interpretable 3D Cross-Modal Learning for Drug-Target Interaction Prediction and Large-Scale Drug Discovery.

Bioinformatics (Oxford, England)·2026

See all related articles

Kernel methods, like the support vector machine (SVM), can integrate diverse biological data. An unweighted SVM approach performs well for two data types, but weighted methods are better for multiple noisy datasets.

Area of Science:

Bioinformatics
Computational Biology
Machine Learning

Background:

Integrating diverse biological data (DNA, protein sequences, structures, expression data, networks) requires robust theoretical frameworks.
Kernel methods, particularly the support vector machine (SVM), offer a powerful approach for combining heterogeneous biological datasets.
SVM extensions allow for weighting datasets based on their utility in classification tasks.

Purpose of the Study:

To empirically evaluate the performance of the SVM for inferring gene functional annotations using combined protein sequence and structure data.
To compare the effectiveness of weighted versus unweighted SVM approaches when integrating multiple biological data sources.

Main Methods:

Empirical investigation of support vector machine (SVM) performance.

Related Experiment Videos

Utilizing combined protein sequence and structure data for gene functional annotation inference.

Comparison of unweighted and weighted kernel methods.

Main Results:

The SVM demonstrates robustness to noise in biological datasets.
For two data types, an unweighted SVM performs comparably to or better than weighted methods.
When integrating multiple noisy datasets, weighted approaches outperform naive unweighted combinations.
A naive unweighted sum of kernels may suffice for many applications.

Conclusions:

The support vector machine is a versatile tool for integrating diverse biological data.
The choice between weighted and unweighted kernel methods depends on the number and noise level of the integrated datasets.
For simpler integration tasks, unweighted approaches are efficient and effective.