Search research articles

ABOUT JoVE

Overview Leadership Blog JoVE Help Center

AUTHORS

Publishing Process Editorial Board Scope & Policies Peer Review FAQ Submit

LIBRARIANS

Testimonials Subscriptions Access Resources Library Advisory Board FAQ

RESEARCH

JoVE Journal Methods Collections JoVE Encyclopedia of Experiments Archive

EDUCATION

JoVE Core JoVE Business JoVE Science Education JoVE Lab Manual Faculty Resource Center Faculty Site

Terms & Conditions of Use

Related Concept Videos

Radiological Investigation III: Pulmonary Angiogram and PET Scan

Radiological Investigation III: Pulmonary Angiogram and PET Scan

Radiological investigations are paramount in the diagnosis and management of various pulmonary diseases. Two essential investigations are the Pulmonary Angiogram and the Positron Emission Tomography (PET) Scan.
Pulmonary Angiogram
A Pulmonary Angiogram is an invasive procedure involving injecting a contrast medium through a catheter threaded into the pulmonary artery or the right side of the heart to visualize the pulmonary vasculature. Computed Tomography (CT) scans have mainly replaced this...

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by

Same author

CLEAR-AI: confounder-aware learning for equitable and accurate reasoning in AI for diagnosis.

Journal of medical imaging (Bellingham, Wash.)·2026

Same author

Multimodal artificial intelligence models for radiology.

BJR artificial intelligence·2026

Same author

A birth certificate for data to improve findability, accountability, and traceability.

NAR genomics and bioinformatics·2026

Same author

Extraction of distant recurrence sites for breast cancer patients from free-text clinical notes using large language models.

Journal of biomedical informatics·2026

Same author

Biomedical multimodal large language models: From model-centric development to clinically grounded evaluation and integration.

Journal of biomedical informatics·2026

Same author

EchoAtlas: A Conversational, Multi-View Vision-Language Foundation Model for Echocardiography Interpretation and Clinical Reasoning.

medRxiv : the preprint server for health sciences·2026

Same journal

The Banality of Cancer: Entropy As a Third Pillar of Lung Nodule Risk Assessment.

AJR. American journal of roentgenology·2026

Same journal

A Narrow Window for Artificial Intelligence-Generated Synthetic Temporal Bone CT From MRI.

AJR. American journal of roentgenology·2026

Same journal

From Uncertainty to Actionable Management: The Isolated Abnormal Axillary Lymph Node.

AJR. American journal of roentgenology·2026

Same journal

Beyond Detection: Translating Artificial Intelligence-Driven Opportunistic Screening Into Clinical Action.

AJR. American journal of roentgenology·2026

Same journal

Navigating PSMA PET Radiopharmaceuticals: Clinical and Operational Factors.

AJR. American journal of roentgenology·2026

Same journal

From Mesenteric Ischemia to Intestinal Stroke.

AJR. American journal of roentgenology·2026

See all related articles

Search research articles

Related Experiment Video

Updated: Jan 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Out-of-the-Box Large Language Models for Detecting and Classifying Critical Findings in Radiology Reports Using

Ish A Talati¹, Juan M Zambrano Chaves², Avisha Das³

¹Department of Radiology, Stanford University, 300 Pasteur Dr, Palo Alto, CA 94304.

AJR. American Journal of Roentgenology

|September 10, 2025

Summary

This summary is machine-generated.

Large language models (LLMs) can detect critical findings in radiology reports. A few-shot static approach with GPT-4 demonstrated optimal performance for identifying true critical findings.

Keywords:

critical findings large language models model evaluation prompt engineering radiology reports

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Related Experiment Videos

Last Updated: Jan 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

Area of Science:

Artificial Intelligence in Radiology
Natural Language Processing in Healthcare

Background:

Radiology reports are increasing in complexity and volume, challenging timely critical finding communication.
Effective methods are needed to extract critical information from radiology reports efficiently.

Purpose of the Study:

To evaluate the performance of two large language models (LLMs), GPT-4 and Mistral-7B, in detecting and classifying critical findings in radiology reports.
To compare the effectiveness of various prompt strategies for LLM-based analysis of radiology reports.

Main Methods:

252 radiology reports from MIMIC-III and 180 chest radiography reports from CheXpert Plus were analyzed.
Prompt engineering was performed, and a final prompt was selected for zero-shot, few-shot static, and few-shot dynamic prompting.
GPT-4 and Mistral-7B processed test sets, with evaluation using automated metrics (BLEU-1, ROUGE-F1, G-Eval) and manual metrics (precision, recall).

Main Results:

Few-shot static prompting with GPT-4 achieved the highest ROUGE-F1 score (0.797) for true critical findings.
GPT-4 demonstrated superior precision and recall compared to Mistral-7B across both holdout and external test sets for all critical finding categories.
GPT-4 achieved 90.1% precision and 86.9% recall for true critical findings in the holdout test set.

Conclusions:

Out-of-the-box LLMs can effectively detect and classify critical findings in radiology reports.
A few-shot static prompting strategy is optimal for detecting true critical findings using LLMs.
General-purpose LLMs show promise in adapting to specialized medical tasks with minimal data annotation.