Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Evaluating the Potential Impact of AI on Urinary Tract Infection Diagnosis in the Emergency Department Across Demographic Groups: Retrospective Cohort Study.

JMIR AI·2026

Same author

Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators.

AMIA ... Annual Symposium proceedings. AMIA Symposium·2026

Same author

BEnchmarking Large Language Models for Ophthalmology (BELO): An Expert-Curated Data Set and Evaluation Framework for Knowledge and Reasoning.

Ophthalmology science·2026

Same author

Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology.

ArXiv·2025

Same author

MedCalc-Bench: Evaluating Large Language Models for Medical Calculations.

ArXiv·2025

Same author

Language Enhanced Model for Eye (LEME): An Open-Source Ophthalmology-Specific Large Language Model.

ArXiv·2025

Same journal

Optimization in Sparse 2D to Dense 3D Weakly Supervised Learning: Application to Multi-Label Segmentation of Large ex vivo MRI Data.

ArXiv·2026

Same journal

Overview of the MedHopQA track at BioCreative IX: track description, participation and evaluation of systems for multi-hop medical question answering.

ArXiv·2026

Same journal

Characterizing Universal Object Representations Across Vision Models.

ArXiv·2026

Same journal

CXR-LT 2026 Challenge: Multi-Center Long-Tailed and Zero Shot Chest X-ray Classification.

ArXiv·2026

Same journal

What Do Biomedical NER and Entity Linking Benchmarks Measure? A Corpus-Centric Diagnostic Framework.

ArXiv·2026

Same journal

The Origin of Life in the Light of Evolution.

ArXiv·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Benchmarking large language models for biomedical natural language processing applications and recommendations.

Qingyu Chen^1,2, Yan Hu³, Xueqing Peng¹

¹Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, USA.

|October 1, 2025

Summary

This summary is machine-generated.

Large Language Models (LLMs) show potential in biomedical natural language processing (BioNLP), but fine-tuning traditional models often performs better. Closed-source LLMs excel at reasoning, while open-source models need further optimization for BioNLP tasks.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Related Experiment Videos

Last Updated: Jan 16, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Constructing and Visualizing Models using Mime-based Machine-learning Framework

Published on: July 22, 2025

Area of Science:

Biomedical Natural Language Processing (BioNLP)
Artificial Intelligence in Healthcare
Computational Linguistics

Background:

The exponential growth of biomedical literature necessitates automated knowledge extraction.
Biomedical Natural Language Processing (BioNLP) offers a solution for efficient information synthesis.
The efficacy of Large Language Models (LLMs) in specialized BioNLP tasks is not well-established.

Purpose of the Study:

To systematically evaluate the performance of leading Large Language Models (LLMs) on diverse BioNLP benchmarks.
To compare LLM performance (zero-shot, few-shot, fine-tuning) against traditional fine-tuned models like BERT and BART.
To identify practical challenges and provide insights for LLM application in BioNLP.

Main Methods:

Evaluation of four LLMs (GPT, LLaMA representatives) across 12 BioNLP benchmarks and six application types.
Comparative analysis of zero-shot, few-shot, and fine-tuning approaches for LLMs.
Benchmarking against fine-tuned BERT and BART models, including analysis of inconsistencies, hallucinations, and cost.

Main Results:

Traditional fine-tuned models generally outperform zero- or few-shot LLMs on most BioNLP tasks.
Closed-source LLMs (e.g., GPT-4) demonstrate superior performance in reasoning-intensive tasks like medical question answering.
Open-source LLMs require fine-tuning to achieve competitive performance, and issues like information omission and hallucinations were observed.

Conclusions:

Fine-tuning remains a robust strategy for BioNLP, often surpassing basic LLM prompting.
Specific LLMs show promise for complex reasoning tasks, but require careful validation.
Practical guidelines are needed to address LLM limitations and optimize their use in biomedical knowledge processing.