Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Improving Translational Accuracy02:07

Improving Translational Accuracy

14.1K
Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...
14.1K
Improving Translational Accuracy02:07

Improving Translational Accuracy

3.5K
3.5K

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

Evaluating the Potential Impact of AI on Urinary Tract Infection Diagnosis in the Emergency Department Across Demographic Groups: Retrospective Cohort Study.

JMIR AI·2026
Same author

Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026
Same author

BEnchmarking Large Language Models for Ophthalmology (BELO): An Expert-Curated Data Set and Evaluation Framework for Knowledge and Reasoning.

Ophthalmology science·2026
Same author

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology.

ArXiv·2025
Same author

MedCalc-Bench: Evaluating Large Language Models for Medical Calculations.

ArXiv·2025
Same author

Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model.

ArXiv·2025
Same journal

Optimization in Sparse 2D to Dense 3D Weakly Supervised Learning: Application to Multi-Label Segmentation of Large ex vivo MRI Data.

ArXiv·2026
Same journal

Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering.

ArXiv·2026
Same journal

Characterizing Universal Object Representations Across Vision Models.

ArXiv·2026
Same journal

CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification.

ArXiv·2026
Same journal

What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework.

ArXiv·2026
Same journal

The Origin of Life in the Light of Evolution.

ArXiv·2026
See all related articles

Related Experiment Video

Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K

Benchmarking large language models for biomedical natural language processing applications and recommendations.

Qingyu Chen1,2, Yan Hu3, Xueqing Peng1

  • 1Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, USA.

Arxiv
|October 1, 2025
PubMed
Summary
This summary is machine-generated.

Large Language Models (LLMs) show potential in biomedical natural language processing (BioNLP), but fine-tuning traditional models often performs better. Closed-source LLMs excel at reasoning, while open-source models need further optimization for BioNLP tasks.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.3K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

2.3K

Related Experiment Videos

Last Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K
Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems
05:47

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

1.3K
Constructing and Visualizing Models using Mime-based Machine-learning Framework
06:19

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

2.3K

Area of Science:

  • Biomedical Natural Language Processing (BioNLP)
  • Artificial Intelligence in Healthcare
  • Computational Linguistics

Background:

  • The exponential growth of biomedical literature necessitates automated knowledge extraction.
  • Biomedical Natural Language Processing (BioNLP) offers a solution for efficient information synthesis.
  • The efficacy of Large Language Models (LLMs) in specialized BioNLP tasks is not well-established.

Purpose of the Study:

  • To systematically evaluate the performance of leading Large Language Models (LLMs) on diverse BioNLP benchmarks.
  • To compare LLM performance (zero-shot, few-shot, fine-tuning) against traditional fine-tuned models like BERT and BART.
  • To identify practical challenges and provide insights for LLM application in BioNLP.

Main Methods:

  • Evaluation of four LLMs (GPT, LLaMA representatives) across 12 BioNLP benchmarks and six application types.
  • Comparative analysis of zero-shot, few-shot, and fine-tuning approaches for LLMs.
  • Benchmarking against fine-tuned BERT and BART models, including analysis of inconsistencies, hallucinations, and cost.

Main Results:

  • Traditional fine-tuned models generally outperform zero- or few-shot LLMs on most BioNLP tasks.
  • Closed-source LLMs (e.g., GPT-4) demonstrate superior performance in reasoning-intensive tasks like medical question answering.
  • Open-source LLMs require fine-tuning to achieve competitive performance, and issues like information omission and hallucinations were observed.

Conclusions:

  • Fine-tuning remains a robust strategy for BioNLP, often surpassing basic LLM prompting.
  • Specific LLMs show promise for complex reasoning tasks, but require careful validation.
  • Practical guidelines are needed to address LLM limitations and optimize their use in biomedical knowledge processing.