Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

A regression-based K nearest neighbor algorithm for gene function prediction from heterogeneous data.

Zizhen Yao1, Walter L Ruzzo

  • 1Department of Computer Science and Engineering, AC101 Paul G. Allen Center, University of Washington, Seattle WA 98195, USA. yzizhen@cs.washington.edu

BMC Bioinformatics
|May 26, 2006
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Conserved Cell Type Signatures Across the Brainstem and Spinal Cord in the Mouse Central Nervous System.

bioRxiv : the preprint server for biology·2026
Same author

Whole-neuron morphology and genetic identity define cell types and reveal principles of brain-wide connectivity.

Cell reports·2026
Same author

Genoarchitecture and input-output organization of the mouse basal ganglia and thalamic parafascicular nucleus.

Nature neuroscience·2026
Same author

Genome-scale functional mapping of the mammalian whole brain with in vivo Perturb-seq.

bioRxiv : the preprint server for biology·2026
Same author

<i>MapMyCells:</i> High-performance mapping of unlabeled cell-by-gene data to reference brain taxonomies.

bioRxiv : the preprint server for biology·2026
Same author

A consensus spinal cord cell type atlas across mouse, macaque, and human.

bioRxiv : the preprint server for biology·2026
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

This study introduces an enhanced k-nearest-neighbor (KNN) algorithm for predicting gene function by integrating diverse biological data. The improved KNN method significantly boosts prediction accuracy for complex genomic datasets.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • The increasing volume of functional genomic and proteomic data necessitates advanced methods for integrating heterogeneous sources.
  • Current functional analysis methodologies struggle to effectively combine diverse data types for gene function prediction.

Purpose of the Study:

  • To develop a general framework for gene function prediction that integrates heterogeneous data sources.
  • To improve the accuracy and reliability of gene function prediction using an enhanced k-nearest-neighbor (KNN) algorithm.

Main Methods:

  • Proposed a novel framework for gene function prediction utilizing the k-nearest-neighbor (KNN) algorithm.
  • Developed a regression-based approach to infer an optimal similarity metric for heterogeneous data.

Related Experiment Videos

  • Introduced a new voting scheme for generating confidence scores to estimate prediction accuracy.
  • Main Results:

    • The proposed KNN algorithm significantly outperformed naive KNN methods in gene function prediction.
    • The method demonstrated competitive performance against support vector machine (SVM) algorithms for integrating heterogeneous data.
    • Combining multiple data sources led to a substantial improvement in prediction accuracy.

    Conclusions:

    • The enhanced KNN framework, featuring automatic feature weighting and probabilistic inference, significantly improves prediction accuracy.
    • The method is efficient, intuitive, and flexible, offering a robust solution for gene function prediction.
    • This general framework is adaptable to other classification problems involving heterogeneous datasets.