Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Methods of Documentation VI: Case Management Model

Methods of Documentation VI: Case Management Model

The case management model is a multidisciplinary approach that involves healthcare professionals from diverse disciplines, such as physicians, nurses, therapists, social workers, and pharmacists, working collaboratively to address the various needs of patients. Each healthcare professional brings unique expertise and perspectives, contributing to a more comprehensive understanding of the patient's condition and tailoring treatment plans accordingly.
For example, a patient with a chronic...

SBAR II: Application of SBAR

SBAR II: Application of SBAR

SBAR is an effective communication tool used by healthcare professionals to communicate patient information accurately. SBAR stands for Situation, Background, Assessment, and Recommendation. For a better understanding, an example is given below.
SBAR Report from a Nurse to a Health Care Provider
S: "Hello, Dr. Smith. This is Jane, RN, from the Med Surg unit. I am calling to tell you about Ms. White in Room 210, who is experiencing increased pain and redness at her incision site. Her recent...

Decision Making: Traditional Method

Decision Making: Traditional Method

The process of hypothesis testing based on the traditional method includes calculating the critical value, testing the value of the test statistic using the sample data, and interpreting these values.
First, a specific claim about the population parameter is decided based on the research question and is stated in a simple form. Further, an opposing statement to this claim is also stated. These statements can act as null and alternative hypotheses, out of which a null hypothesis would be a...

Pharmacokinetic Models: Comparison and Selection Criterion

Pharmacokinetic Models: Comparison and Selection Criterion

Physiological and compartmental models are valuable tools used in studying biological systems. These models rely on differential equations to maintain mass balance within the system, ensuring an accurate representation of the dynamic processes at play.
Physiological models take a detailed approach by considering specific molecular processes. They can predict drug distribution, metabolism, and elimination changes, providing a comprehensive understanding of how drugs interact with the body.

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

Erratum for: Associations of MRI-derived Paraspinal IMAT and LMM with Cardiometabolic Risk Factors: Results from a German Cohort.

Radiology·2026

Same author

clickBrick prompt engineering: optimizing large language model performance in clinical psychiatry.

Npj mental health research·2026

Same author

Gut decisions based on the liver: prediction of colorectal neoplasia using AI-based liver analysis of routine CT scans.

Frontiers in oncology·2026

Same author

Counterfactual Diffusion Models Provide Interpretable Explanations of Artificial Intelligence Models in Pathology.

Cancer research·2026

Same author

Towards autonomous medical artificial intelligence agents.

Nature·2026

Same author

SCRIPT: Stratified clinical risk prediction from pathology reports using large language models.

Journal of pathology informatics·2026

Same journal

Enhancing anatomical recognition by surgeons during pelvic lymph node dissection using artificial intelligence.

NPJ digital medicine·2026

Same journal

AFP assistant: a retrieval-augmented generation and large language model-powered multilingual polio chatbot for low-resource language communities.

NPJ digital medicine·2026

Same journal

Structured reasoning failures compromise LLM interpretation of clinical oncology notes.

NPJ digital medicine·2026

Same journal

Translation of frozen sections into FFPE images for skin cancer resection margins using generative AI.

NPJ digital medicine·2026

Same journal

FedFound: a federated foundation model for lifespan brain morphological connectome analysis.

NPJ digital medicine·2026

Same journal

A multimodal instruction dataset and benchmark for ultrasound understanding.

NPJ digital medicine·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Feb 20, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Benchmarking large language model-based agent systems for clinical decision tasks.

Yunsong Liu^1,2, Zunamys I Carrero², Xiaofeng Jiang^2,3

¹Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.

NPJ Digital Medicine

|February 18, 2026

Summary

This summary is machine-generated.

Agentic artificial intelligence (AI) systems show limited performance gains in healthcare despite advanced tools. Current systems offer modest benefits with high computational costs, highlighting the need for improved AI solutions.

More Related Videos

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Related Experiment Videos

Last Updated: Feb 20, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Evidence-based Knowledge Synthesis and Hypothesis Validation: Navigating Biomedical Knowledge Bases via Explainable AI and Agentic Systems

Published on: June 13, 2025

Area of Science:

Artificial Intelligence
Medical Informatics
Computational Medicine

Background:

Agentic AI systems, capable of autonomous reasoning and tool use, show potential in healthcare applications.
Systematic real-world performance evaluation of these advanced AI systems in medicine is currently limited.
Existing benchmarks do not fully capture the complexities of clinical decision-making and tool integration.

Purpose of the Study:

To systematically benchmark the real-world performance of two agentic AI systems in healthcare settings.
To evaluate the efficacy of agentic AI across diverse medical tasks, including diagnostics, QA, and complex examinations.
To assess the trade-offs between performance gains, resource utilization, and hallucination rates in medical AI agents.

Main Methods:

Evaluated OpenManus (Llama-4 based) and Manus (proprietary multistep architecture) on AgentClinic, MedAgentsBench, and Humanity's Last Exam (HLE) benchmarks.
Assessed performance on text-based and multimodal medical question-answering and diagnostic simulations.
Quantified accuracy, token usage, latency, and hallucination rates, with in-agent safeguards.

Main Results:

Agentic AI systems provided modest accuracy improvements over baseline LLMs, with significant increases in token usage and latency.
Accuracy on AgentClinic MedQA reached 60.3%, MedAgentsBench 30.3%, and HLE text 8.6%.
Multimodal accuracy was low (15.5% on HLE, 29.2% on AgentClinic NEJM), and hallucinations persisted despite safeguards.

Conclusions:

Current agentic AI designs offer limited performance benefits in healthcare relative to their substantial computational and workflow costs.
There is a critical need for the development of more accurate, efficient, and clinically viable agent systems for medical applications.
Further research is required to optimize agentic AI architectures for practical healthcare deployment.