Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

15.7K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
15.7K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.8K
3.8K
Anatomical Terminology01:20

Anatomical Terminology

32.6K
Knowledge of anatomy is essential to understand human biology and medicine. Anatomists and health care professionals use standard terminology to describe the human body with more precision and no ambiguity. Anatomical terms have mostly Greek and Latin-derived roots. Because these languages are rarely used in conversation, the meaning of words remains the same. Each term is made up of a root in between the prefixes and suffixes. The root of a term often refers to an organ, tissue, or condition,...
32.6K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Note-Level Phenotyping of Multiple-Sclerosis Notes by a Large Language Model Achieves near Human-Level Agreement.

Journal of clinical medicine·2026
Same author

Editorial: The digitalization of neurology-volume II.

Frontiers in digital health·2026
Same author

Serum biomarker trajectory clusters predict functional outcome and quality of life for traumatic brain injury.

Brain communications·2026
Same author

Session Introduction: Precision Medicine: Integrating Large-Scale Data and Intermediate Phenotypes for Understanding Health and Treating Disease.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same author

A random-walk-based learning framework to uncover novel gene candidates for Alzheimer's disease therapy.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing·2026
Same author

Large language models for neurology: a mini review.

Frontiers in digital health·2026
Same journal

Mobile cognitive testing captures divergent longitudinal trajectories of verbal learning in adults with and without HIV.

Frontiers in digital health·2026
Same journal

Correction: Development, implementation and evaluation of a digital treatment for adolescents with chronic pain: a protocol for a multi-phase study.

Frontiers in digital health·2026
Same journal

Explainable AI in breast cancer ultrasound imaging: current developments and challenges.

Frontiers in digital health·2026
Same journal

Assessing organisational and technological readiness for artificial intelligence implementation in the Ghana health service: a systematic review protocol.

Frontiers in digital health·2026
Same journal

Monitoring and evaluation of an artificial intelligence-enhanced wound care intervention in a rural health network: defining stakeholder expectations and shared priorities.

Frontiers in digital health·2026
Same journal

A multidimensional ensemble pipeline for early detection of IUGR condition through CTG.

Frontiers in digital health·2026
See all related articles

Related Experiment Video

Updated: Apr 21, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K

From memorization to generalization: fine-tuning large language models for biomedical term-to-identifier

Suswitha Pericharla1, Daniel B Hier2, Tayo Obafemi-Ajayi3

  • 1Computer Science Department, Missouri State University, Springfield, MO, United States.

Frontiers in Digital Health
|April 20, 2026
PubMed
Summary
This summary is machine-generated.

Large language models struggle with biomedical term-to-identifier mapping. Fine-tuning improves accuracy, especially for popular and lexicalized terms, but generalization remains limited for ontologies like HPO and GO.

Keywords:
HGNCbiomedical ontologiesfine-tuninggene ontologygeneralizationhuman phenotype ontologylarge language modelslexicalization

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.9K

Related Experiment Videos

Last Updated: Apr 21, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.3K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.9K

Area of Science:

  • Biomedical Informatics
  • Natural Language Processing
  • Computational Biology

Background:

  • Biomedical data integration relies on term-to-identifier normalization for computable and interoperable data.
  • Large language models (LLMs) excel at clinical text tasks but show lower accuracy in mapping biomedical terms to ontology identifiers.

Purpose of the Study:

  • To investigate the roles of memorization and generalization in LLM term-to-code mapping.
  • To assess performance across different ontologies (Human Phenotype Ontology, Gene Ontology) and gene naming systems (HGNC).
  • To evaluate the impact of model size and fine-tuning on mapping accuracy.

Main Methods:

  • Examined term-to-code mapping performance of LLMs across Human Phenotype Ontology (HPO), Gene Ontology (GO), and HGNC gene naming system.
  • Assessed performance of multiple base models and after task-specific fine-tuning.
  • Analyzed embedding spaces for semantic alignment between terms and identifiers.

Main Results:

  • Accuracy increased with model size, with GPT-4o outperforming Llama 3.1 models.
  • Fine-tuning enhanced forward mappings, particularly for GO, but showed minimal gains for gene name-to-HGNC identifier mappings.
  • Generalization occurred for HGNC gene symbols but not for HPO/GO identifiers; embedding analysis showed semantic alignment for gene names but not for HPO/GO concepts.

Conclusions:

  • Fine-tuning success depends on term popularity and lexicalization.
  • Popularity influences baseline accuracy and memorization gains, while lexicalization enables generalization.
  • Findings offer a framework to predict fine-tuning effectiveness for biomedical term normalization.