Jove
Visualize
Contact Us
JoVE
x logofacebook logolinkedin logoyoutube logo
ABOUT JoVE
OverviewLeadershipBlogJoVE Help Center
AUTHORS
Publishing ProcessEditorial BoardScope & PoliciesPeer ReviewFAQSubmit
LIBRARIANS
TestimonialsSubscriptionsAccessResourcesLibrary Advisory BoardFAQ
RESEARCH
JoVE JournalMethods CollectionsJoVE Encyclopedia of ExperimentsArchive
EDUCATION
JoVE CoreJoVE BusinessJoVE Science EducationJoVE Lab ManualFaculty Resource CenterFaculty Site
Terms & Conditions of Use
Privacy Policy
Policies

Related Concept Videos

Radiological Investigation III: Pulmonary Angiogram and PET Scan01:13

Radiological Investigation III: Pulmonary Angiogram and PET Scan

411
Radiological investigations are paramount in the diagnosis and management of various pulmonary diseases. Two essential investigations are the Pulmonary Angiogram and the Positron Emission Tomography (PET) Scan.
Pulmonary Angiogram
A Pulmonary Angiogram is an invasive procedure involving injecting a contrast medium through a catheter threaded into the pulmonary artery or the right side of the heart to visualize the pulmonary vasculature. Computed Tomography (CT) scans have mainly replaced this...
411

You might also read

Related Articles

Articles linked to this work by shared authors, journal, and citation graph.

Sort by
Same author

CLEAR-AI: confounder-aware learning for equitable and accurate reasoning in AI for diagnosis.

Journal of medical imaging (Bellingham, Wash.)·2026
Same author

Multimodal artificial intelligence models for radiology.

BJR artificial intelligence·2026
Same author

A birth certificate for data to improve findability, accountability, and traceability.

NAR genomics and bioinformatics·2026
Same author

Extraction of distant recurrence sites for breast cancer patients from free-text clinical notes using large language models.

Journal of biomedical informatics·2026
Same author

Biomedical multimodal large language models: From model-centric development to clinically grounded evaluation and integration.

Journal of biomedical informatics·2026
Same author

EchoAtlas: A Conversational, Multi-View Vision-Language Foundation Model for Echocardiography Interpretation and Clinical Reasoning.

medRxiv : the preprint server for health sciences·2026
Same journal

The Banality of Cancer: Entropy As a Third Pillar of Lung Nodule Risk Assessment.

AJR. American journal of roentgenology·2026
Same journal

A Narrow Window for Artificial Intelligence-Generated Synthetic Temporal Bone CT From MRI.

AJR. American journal of roentgenology·2026
Same journal

From Uncertainty to Actionable Management: The Isolated Abnormal Axillary Lymph Node.

AJR. American journal of roentgenology·2026
Same journal

Beyond Detection: Translating Artificial Intelligence-Driven Opportunistic Screening Into Clinical Action.

AJR. American journal of roentgenology·2026
Same journal

Navigating PSMA PET Radiopharmaceuticals: Clinical and Operational Factors.

AJR. American journal of roentgenology·2026
Same journal

From Mesenteric Ischemia to Intestinal Stroke.

AJR. American journal of roentgenology·2026
See all related articles

Related Experiment Video

Updated: Jan 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K

Out-of-the-Box Large Language Models for Detecting and Classifying Critical Findings in Radiology Reports Using

Ish A Talati1, Juan M Zambrano Chaves2, Avisha Das3

  • 1Department of Radiology, Stanford University, 300 Pasteur Dr, Palo Alto, CA 94304.

AJR. American Journal of Roentgenology
|September 10, 2025
PubMed
Summary
This summary is machine-generated.

Large language models (LLMs) can detect critical findings in radiology reports. A few-shot static approach with GPT-4 demonstrated optimal performance for identifying true critical findings.

Keywords:
critical findingslarge language modelsmodel evaluationprompt engineeringradiology reports

More Related Videos

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.4K
Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

43.6K

Related Experiment Videos

Last Updated: Jan 18, 2026

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness
03:14

Augmenting Large Language Models via Vector Embeddings to Improve Domain-Specific Responsiveness

Published on: December 6, 2024

1.0K
Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model
07:15

Machine Learning Algorithms for Early Detection of Bone Metastases in an Experimental Rat Model

Published on: August 16, 2020

7.4K
Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns
13:44

Detection of Architectural Distortion in Prior Mammograms via Analysis of Oriented Patterns

Published on: August 30, 2013

43.6K

Area of Science:

  • Artificial Intelligence in Radiology
  • Natural Language Processing in Healthcare

Background:

  • Radiology reports are increasing in complexity and volume, challenging timely critical finding communication.
  • Effective methods are needed to extract critical information from radiology reports efficiently.

Purpose of the Study:

  • To evaluate the performance of two large language models (LLMs), GPT-4 and Mistral-7B, in detecting and classifying critical findings in radiology reports.
  • To compare the effectiveness of various prompt strategies for LLM-based analysis of radiology reports.

Main Methods:

  • 252 radiology reports from MIMIC-III and 180 chest radiography reports from CheXpert Plus were analyzed.
  • Prompt engineering was performed, and a final prompt was selected for zero-shot, few-shot static, and few-shot dynamic prompting.
  • GPT-4 and Mistral-7B processed test sets, with evaluation using automated metrics (BLEU-1, ROUGE-F1, G-Eval) and manual metrics (precision, recall).

Main Results:

  • Few-shot static prompting with GPT-4 achieved the highest ROUGE-F1 score (0.797) for true critical findings.
  • GPT-4 demonstrated superior precision and recall compared to Mistral-7B across both holdout and external test sets for all critical finding categories.
  • GPT-4 achieved 90.1% precision and 86.9% recall for true critical findings in the holdout test set.

Conclusions:

  • Out-of-the-box LLMs can effectively detect and classify critical findings in radiology reports.
  • A few-shot static prompting strategy is optimal for detecting true critical findings using LLMs.
  • General-purpose LLMs show promise in adapting to specialized medical tasks with minimal data annotation.