Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Mining phenotypes for gene function prediction.

Philip Groth1, Bertram Weiss, Hans-Dieter Pohlenz

  • 1Research Laboratories of Bayer Schering Pharma AG, Berlin, Germany. groth@informatik.hu-berlin.de

BMC Bioinformatics
|March 5, 2008
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Knowledge-augmented pre-trained language models for biomedical relation extraction.

BMC bioinformatics·2025
Same author

Explaining care need assessment surveys: qualitative and quantitative evaluation of state-of-the-art local and global explainable artificial intelligence methods.

JAMIA open·2025
Same author

Senescence-associated lineage-aberrant plasticity evokes T-cell-mediated tumor control.

Nature communications·2025
Same author

Global overview of usable Landsat and Sentinel-2 data for 1982-2023.

Data in brief·2024
Same author

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools.

Bioinformatics (Oxford, England)·2024
Same author

BELHD: improving biomedical entity linking with homonym disambiguation.

Bioinformatics (Oxford, England)·2024
Same journal

OpenIMC: an open-source platform for analyzing single-cell and spatial proteomics by imaging mass cytometry.

BMC bioinformatics·2026
Same journal

NAP: an open source pipeline for cross-domain microbiome profiling using Nanopore sequencing-derived amplicon data.

BMC bioinformatics·2026
Same journal

SurvGME: an R package for survival analysis with graphical and measurement error models.

BMC bioinformatics·2026
Same journal

SimMapNet: a Bayesian framework for gene regulatory network inference using gene ontology similarities as external hint.

BMC bioinformatics·2026
Same journal

Dual channel drug-drug interactions extraction based on cross attention.

BMC bioinformatics·2026
Same journal

FeSseqdb: a curated sequence-level database and interpretable machine learning framework for identifying iron-sulfur proteins.

BMC bioinformatics·2026
See all related articles

We used text clustering on phenotype data to predict gene function, grouping genes with similar descriptions. This approach successfully infers new gene annotations and reveals biological coherence.

Area of Science:

  • Genomics
  • Bioinformatics
  • Computational Biology

Background:

  • Organism health and disease are linked to observable traits called phenotypes.
  • Identifying genetic causes of disease often requires precise phenotype definition.
  • High-throughput phenotyping technologies have advanced gene function discovery, but data utilization beyond genotype-phenotype links is limited.

Purpose of the Study:

  • To predict gene function using large-scale textual phenotype data.
  • To group genes based on shared phenotype descriptions via text clustering.
  • To leverage these clusters for inferring gene annotations.

Main Methods:

  • Text clustering applied to phenotype descriptions to group genes.
  • Analysis of cluster coherence using Gene Ontology (GO) functional annotations and protein-protein interactions.

Related Experiment Videos

  • Cross-validation to evaluate the precision and recall of predicted GO-term annotations.
  • Main Results:

    • Gene clusters based on phenotype descriptions show significant biological coherence.
    • The method achieved up to 72.6% precision and 16.7% recall in predicting GO-term annotations for biological processes.
    • Manual verification confirmed high biological coherence within clusters, such as grouping Drosophila odorant receptors.

    Conclusions:

    • Phenotype data are valuable for inferring novel gene functions due to their reflection of genetic activity.
    • Systematic, large-scale analysis of phenotype data offers significant potential for gene functional annotation.
    • Text clustering is an effective computational method for analyzing phenotype data to predict gene function.