Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Drug Nomenclature

Drug Nomenclature

During the development of a new pharmaceutical, the manufacturer initially assigns a code name to the drug. Once approved, the drug receives a United States Adopted Name (USAN)—a generic, nonproprietary designation. Upon being listed in the United States Pharmacopeia, this nonproprietary name becomes the drug's official name. Additionally, the manufacturer assigns a proprietary name or trademark, which serves as the brand name under which the drug is marketed. It is worth noting that...

Pharmacokinetic Models: Overview

Pharmacokinetic Models: Overview

Pharmacokinetic models utilize mathematical analysis to achieve a detailed quantitative understanding of a drug's life cycle within the body. They are instrumental in simulating a drug's pharmacokinetic parameters, predicting drug concentrations over time, optimizing dosage regimens, linking concentrations with pharmacologic activity, and estimating potential toxicity.
There are three primary types of models: empirical, compartment, and physiological. Empirical models, with minimal...

Drug Discovery: Overview

Drug Discovery: Overview

Drug discovery is a multifaceted process involving extensive screening, testing, and optimization of lead compounds to identify potential new drugs for therapeutic use. It combines several approaches, including screening large numbers of natural products, chemical modification of known active molecules, identification of new drug targets, and rational design based on biological mechanisms and drug-receptor structure. These approaches are carried out in both academic research laboratories and...

Drug Biotransformation: Overview

Drug Biotransformation: Overview

Pharmaceutical substances known as xenobiotics are predominantly lipophilic and nonionized. This enables them to permeate lipid bilayers, such as cell membranes, and interact with intracellular target receptors. Lipophilic drugs have an advantage in crossing biological barriers and reaching their intended sites of action. However, lipophilic drugs often have a restricted capacity for renal expulsion or elimination from the body. When these drugs enter the kidneys and undergo glomerular...

Drug Regulation

Drug Regulation

Drug regulation encompasses the management of drug usage by evaluating its safety and efficacy through assessments conducted by regulatory authorities. Regrettably, the history of drug regulation is marred by several catastrophic events. One such incident is the Elixir Sulfanilamide tragedy, in which the toxic compound diethyl glycol was included in a sweet-tasting medication, leading to numerous fatalities. This event prompted the enactment of the Food, Drug, and Cosmetic Act in 1938. Under...

Prescription, Nonprescription and Orphan Drugs

Prescription, Nonprescription and Orphan Drugs

Prescription drugs require a prescription from a medical practitioner and can only be obtained from a pharmacy. They have many applications, including treating pain, anxiety, and hypertension.
The misuse and addiction to prescription drugs is a growing problem that can affect people of all age groups, specifically teenagers. This can happen when prescription medications are used in ways not intended by the prescriber, such as taking someone else's prescription or using medication for...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

MHub.ai: A Standardized Platform for Reproducible AI Research in Medical Imaging.

Research square·2026

Same author

Large language models require a new form of oversight: capability-based monitoring.

NPJ digital medicine·2026

Same author

AI literacy among healthcare professionals and students in the Americas.

Lancet regional health. Americas·2026

Same author

Face aging rate quantifies change in biological age to predict cancer outcomes.

Nature communications·2026

Same author

Tracing the Pen: Electronic Health Records Amid the Rise of Generative AI.

NPJ digital medicine·2026

Same author

Evaluation of SOFA-2 Score Performance Across Demographic Subgroups: An External Validation Study Using MIMIC-IV.

medRxiv : the preprint server for health sciences·2026

Same journal

Visual Self-Refinement for Autoregressive Models.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

README: Bridging Medical Jargon and Lay Understanding for Patient Education through Data-Centric NLP.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

MedCOD: Enhancing English-to-Spanish Medical Translation of Large Language Models Using Enriched Chain-of-Dictionary Framework.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

Large Language Models are In-context Teachers for Knowledge Reasoning.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

Using tournaments to calculate AUROC for zero-shot classification with LLMs.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026

Same journal

Large Language Models for Controllable Multi-property Multi-objective Molecule Optimization.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 9, 2026

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks.

Jack Gallifant¹, Shan Chen^2,3,4, Pedro Moreira^1,5

¹MIT.

Findings of ACL. EMNLP. Conference on Empirical Methods in Natural Language Processing

|December 4, 2025

Summary

This summary is machine-generated.

Large language models (LLMs) show decreased performance on medical question-answering tasks when brand-name drugs are replaced with generic names. Test data contamination in pre-training datasets may explain this fragility.

More Related Videos

Pharmacophore Modeling for Targets with Extensive Ligand Libraries: A Case Study on SARS-CoV-2 Mpro

Pharmacophore Modeling for Targets with Extensive Ligand Libraries: A Case Study on SARS-CoV-2 Mpro

Published on: September 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Related Experiment Videos

Last Updated: Jan 9, 2026

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Pharmacophore Modeling for Targets with Extensive Ligand Libraries: A Case Study on SARS-CoV-2 Mpro

Pharmacophore Modeling for Targets with Extensive Ligand Libraries: A Case Study on SARS-CoV-2 Mpro

Published on: September 26, 2025

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Area of Science:

Artificial Intelligence
Medical Informatics
Natural Language Processing

Background:

Medical knowledge is context-dependent, requiring consistent reasoning across semantic variations in natural language.
Patient communication often uses brand-name drugs (e.g., Advil, Tylenol) instead of generic equivalents, posing challenges for medical AI.
Evaluating the robustness of medical AI to these variations is critical for reliable clinical applications.

Purpose of the Study:

To create a novel robustness dataset, RABBITS, for evaluating LLM performance on medical benchmarks.
To assess the impact of substituting brand and generic drug names on LLM accuracy.
To identify potential causes for performance degradation in medical LLMs.

Main Methods:

Developed the RABBITS dataset with physician expert annotations for brand-generic drug name substitutions.
Evaluated open-source and API-based Large Language Models (LLMs) on established medical question-answering datasets (MedQA, MedMCQA).
Analyzed performance differences before and after drug name substitutions.

Main Results:

A consistent performance drop, ranging from 1-10%, was observed in LLMs after drug name swapping.
Both open-source and API-based LLMs exhibited this fragility.
Analysis suggested potential test data contamination in pre-training datasets as a contributing factor.

Conclusions:

LLMs demonstrate fragility in medical question-answering tasks due to variations in drug nomenclature.
The RABBITS dataset provides a valuable resource for assessing and improving LLM robustness in healthcare.
Mitigating test data contamination and enhancing contextual understanding are crucial for reliable medical AI.