Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Search research articles

Related Experiment Videos

Gene name identification and normalization using a model organism database.

Alexander A Morgan¹, Lynette Hirschman, Marc Colosimo

¹MITRE Corporation, 202 Burlington Road, Mail Stop K325, Bedford, MA 01730-1420, USA. amorgan@mitre.org

Journal of Biomedical Informatics

|November 16, 2004

Summary

This summary is machine-generated.

Related Concept Videos

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Overview of the COVID-19 text mining tool interactive demonstration track in BioCreative VII.

Database : the journal of biological databases and curation·2022

Same author

Challenges and opportunities for mining adverse drug reactions: perspectives from pharma, regulatory agencies, healthcare providers and consumers.

Database : the journal of biological databases and curation·2022

Same author

Assessing Open-Ended Human-Computer Collaboration Systems: Applying a Hallmarks Approach.

Frontiers in artificial intelligence·2021

Same author

Use of wearable physiological sensors to predict cognitive workload in a visuospatial learning task.

Technology and health care : official journal of the European Society for Engineering and Medicine·2021

Same author

ADE Eval: An Evaluation of Text Processing Systems for Adverse Event Extraction from Drug Labels for Pharmacovigilance.

Drug safety·2020

Same author

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Journal of the American Medical Informatics Association : JAMIA·2020

Same journal

Evaluation of temporal preservation in synthetic longitudinal patient data.

Journal of biomedical informatics·2026

Same journal

ARKE: An ontology-driven framework for automated mapping of local radiology procedure terms to the LOINC-RadLex playbook using large language model.

Journal of biomedical informatics·2026

Same journal

A validation-driven training controller for cross-lingual biomedical NER via reinforcement learning-based adaptive loss weighting.

Journal of biomedical informatics·2026

Same journal

ASP-HR: An Adaptive Spatial Perception and Hierarchical Reasoning mechanism for document-level biomedical relation extraction.

Journal of biomedical informatics·2026

Same journal

Beyond Accuracy: Safety-Centered guidelines for the evaluation of LLM-based therapy recommendation systems for chronic multimorbidity patients.

Journal of biomedical informatics·2026

Same journal

DeepEN: A deep reinforcement learning framework for personalized enteral nutrition in critical care.

Journal of biomedical informatics·2026

See all related articles

Natural language processing aids biological database curation by identifying and normalizing gene mentions. This study developed methods for accurate gene tagging and normalization, improving data organization for researchers.

Area of Science:

Bioinformatics
Computational Biology
Genomics

Background:

Biology is increasingly data-driven, requiring efficient methods for organizing research findings.
Biological databases like FlyBase are crucial for researchers, but manual curation is time-consuming.
Automating the extraction of gene and gene product information from literature is essential.

Purpose of the Study:

To apply natural language processing (NLP) techniques to assist in the curation of biological databases, specifically FlyBase.
To develop and evaluate methods for identifying and normalizing gene and gene product mentions within scientific articles.
To improve the efficiency and accuracy of biological data curation.

Main Methods:

Gene mention tagging using a statistical approach (Hidden Markov Model - HMM) trained on reverse-engineered gene lists.

Related Experiment Videos

Gene name normalization using pattern matching with synonym lists and filtering.

A hybrid approach combining HMM tagging with disambiguation filters for normalization.

Main Results:

Noisy training data for gene mention tagging achieved 78% precision and 88% recall.
The HMM tagger achieved 78% precision and 71% recall for gene mention tagging.
Pattern matching for normalization yielded 95% recall and 2% precision, improved to 50% precision and 72% recall with filters.
The HMM-based normalization approach achieved an F-measure of 0.72 (88% precision, 61% recall).

Conclusions:

FlyBase's lexical resources are sufficient for high recall in gene list extraction.
Accurate disambiguation is critical for effective gene name normalization.
Different NLP strategies for tagging and normalization involve a trade-off between recall and precision.