Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Improving Translational Accuracy

Improving Translational Accuracy

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Improving Translational Accuracy

Improving Translational Accuracy

Language Development

Language Development

Children master language quickly and with relative ease, supported by both biological predisposition and reinforcement. B. F. Skinner (1957) proposed that language is learned through reinforcement, while Noam Chomsky (1965) argued that language acquisition mechanisms are biologically determined.
The critical period for language acquisition suggests that the ability to acquire language is at its peak early in life. As people age, this proficiency decreases. Language development begins very...

Language and Cognition

Language and Cognition

Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

Language

Language

Language is a unique communication system that uses words and systematic rules to organize and transmit information. Unlike other forms of communication, which may involve postures, movements, odors, or vocalizations, language relies on symbols and grammar. This makes human communication distinct from that of other species, who also communicate but do not use language in the same way humans do.
Corballis and Suddendorf (2007) and Tomasello and Rakoczy (2003) highlight the role of language in...

Components of Language

Components of Language

Language, whether spoken, signed, or written, consists of specific components: lexicon and grammar. The lexicon is the vocabulary of a language, comprising its words. Grammar is the set of rules used to convey meaning through the lexicon. For example, English grammar adds “-ed” to most verbs to indicate past tense. Words are formed by combining phonemes, which are the basic sound units of a language. Different languages have different sets of phonemes (e.g., “ah” vs.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Screening for Missed Opportunities for Diagnosis in the ED Using eTriggers and Large Language Models.

JAMA network open·2026

Same author

Large reasoning models as thinking machines for medicine.

Nature biomedical engineering·2026

Same author

Employee preferences in health plan design: results from a national survey.

Health affairs scholar·2026

Same author

BRIDGE: benchmarking large language models for understanding real-world clinical practice texts.

Nature biomedical engineering·2026

Same author

Towards Conversational AI for Disease Management.

Nature·2026

Same author

Societal costs associated with mothers of children with major congenital anomalies: a population-based matched cohort study in Denmark.

BMJ public health·2026

Same journal

Poisoning the Genome: Targeted Backdoor Attacks on DNA Foundation Models.

ArXiv·2026

Same journal

Mechanistic mathematical model of the in vitro infection dynamics of Bunyamwera and Batai viruses including MOI-dependent shortening of the eclipse phase.

ArXiv·2026

Same journal

AI-Driven Lumped-Element Modeling of Human Respiratory System for Studying Voice Mechanics.

ArXiv·2026

Same journal

Beyond Algorithms: Conceptual Innovation in Medical Imaging AI.

ArXiv·2026

Same journal

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization.

ArXiv·2026

Same journal

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3.

ArXiv·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 15, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

First, do NOHARM: towards clinically safe large language models.

David Wu, Fateme Nateghi Haredasht, Saloni Kumar Maharaj

|January 14, 2026

Summary

This summary is machine-generated.

Large language models (LLMs) can provide harmful medical advice, with severe risks in up to 22.2% of cases. A new benchmark, NOHARM, reveals safety issues in AI medical recommendations, highlighting the need for explicit clinical safety evaluation.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Jan 15, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Area of Science:

Artificial Intelligence
Medical Informatics
Clinical Safety

Background:

Large language models (LLMs) are increasingly used for medical advice by both physicians and patients.
The clinical safety profiles of LLM-generated medical advice are not well understood.
Existing benchmarks do not adequately assess the potential harm from AI in medical contexts.

Purpose of the Study:

To introduce NOHARM (Numerous Options Harm Assessment for Risk in Medicine), a novel benchmark for evaluating the clinical safety of LLM-generated medical recommendations.
To quantify the frequency and severity of harm associated with LLM advice across various medical specialties.
To assess the correlation between LLM safety performance and existing AI/medical knowledge benchmarks.

Main Methods:

Developed NOHARM using 100 real primary care-to-specialist consultation cases spanning 10 specialties.
Collected 12,747 expert annotations on 4,249 clinical management options generated by 31 LLMs.
Analyzed the frequency and severity of harm, distinguishing between errors of commission and omission.

Main Results:

LLM recommendations pose a risk of severe harm in up to 22.2% of cases.
Harm of omission constitutes the majority of errors, accounting for 76.6% of all identified harms.
LLM safety performance showed only moderate correlation (r = 0.61-0.64) with existing benchmarks.
The best-performing LLMs demonstrated superior safety compared to generalist physicians.
A multi-agent approach using diverse models improved safety over solo LLM performance.

Conclusions:

Despite proficiency in existing evaluations, widely used LLMs can generate severely harmful medical advice at significant rates.
Clinical safety must be recognized as a distinct and critical performance dimension for medical AI, requiring explicit measurement.
The NOHARM benchmark provides a crucial tool for assessing and improving the safety of AI in healthcare.