Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Experiment Videos

Gene name identification and normalization using a model organism database.

Alexander A Morgan1, Lynette Hirschman, Marc Colosimo

  • 1MITRE Corporation, 202 Burlington Road, Mail Stop K325, Bedford, MA 01730-1420, USA. amorgan@mitre.org

Journal of Biomedical Informatics
|November 16, 2004
PubMed
Summary
This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII.

Database : the journal of biological databases and curation·2022
Same author

Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers.

Database : the journal of biological databases and curation·2022
Same author

Assessing Open-Ended Human-Computer Collaboration Systems: Applying a Hallmarks Approach.

Frontiers in artificial intelligence·2021
Same author

Use of wearable physiological sensors to predict cognitive workload in a visuospatial learning task.

Technology and health care : official journal of the European Society for Engineering and Medicine·2021
Same author

ADE Eval: An Evaluation of Text Processing Systems for Adverse Event Extraction from Drug Labels for Pharmacovigilance.

Drug safety·2020
Same author

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Journal of the American Medical Informatics Association : JAMIA·2020
Same journal

Evaluation of temporal preservation in synthetic longitudinal patient data.

Journal of biomedical informatics·2026
Same journal

ARKE: An ontology-driven framework for automated mapping of local radiology procedure terms to the LOINC-RadLex playbook using large language model.

Journal of biomedical informatics·2026
Same journal

A validation-driven training controller for cross-lingual biomedical NER via reinforcement learning-based adaptive loss weighting.

Journal of biomedical informatics·2026
Same journal

ASP-HR: An Adaptive Spatial Perception and Hierarchical Reasoning mechanism for document-level biomedical relation extraction.

Journal of biomedical informatics·2026
Same journal

Beyond Accuracy: Safety-Centered guidelines for the evaluation of LLM-based therapy recommendation systems for chronic multimorbidity patients.

Journal of biomedical informatics·2026
Same journal

DeepEN: A deep reinforcement learning framework for personalized enteral nutrition in critical care.

Journal of biomedical informatics·2026
See all related articles

Natural language processing aids biological database curation by identifying and normalizing gene mentions. This study developed methods for accurate gene tagging and normalization, improving data organization for researchers.

Area of Science:

  • Bioinformatics
  • Computational Biology
  • Genomics

Background:

  • Biology is increasingly data-driven, requiring efficient methods for organizing research findings.
  • Biological databases like FlyBase are crucial for researchers, but manual curation is time-consuming.
  • Automating the extraction of gene and gene product information from literature is essential.

Purpose of the Study:

  • To apply natural language processing (NLP) techniques to assist in the curation of biological databases, specifically FlyBase.
  • To develop and evaluate methods for identifying and normalizing gene and gene product mentions within scientific articles.
  • To improve the efficiency and accuracy of biological data curation.

Main Methods:

  • Gene mention tagging using a statistical approach (Hidden Markov Model - HMM) trained on reverse-engineered gene lists.

Related Experiment Videos

  • Gene name normalization using pattern matching with synonym lists and filtering.
  • A hybrid approach combining HMM tagging with disambiguation filters for normalization.
  • Main Results:

    • Noisy training data for gene mention tagging achieved 78% precision and 88% recall.
    • The HMM tagger achieved 78% precision and 71% recall for gene mention tagging.
    • Pattern matching for normalization yielded 95% recall and 2% precision, improved to 50% precision and 72% recall with filters.
    • The HMM-based normalization approach achieved an F-measure of 0.72 (88% precision, 61% recall).

    Conclusions:

    • FlyBase's lexical resources are sufficient for high recall in gene list extraction.
    • Accurate disambiguation is critical for effective gene name normalization.
    • Different NLP strategies for tagging and normalization involve a trade-off between recall and precision.