A Clinically-Informed Framework for Evaluating Vision-Language Models in Radiology Report Generation: Taxonomy of Errors and Risk-Aware Metric
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a new framework to evaluate AI-generated radiology reports, focusing on clinical risk and safety. It identifies common errors in vision-language models (VLMs) to improve diagnostic accuracy.
Area Of Science
- Artificial Intelligence
- Medical Imaging
- Natural Language Processing
Background
- Vision-language models (VLMs) show promise for automatic radiology report generation.
- Existing evaluation metrics for these reports are insufficient, lacking clinical specificity.
- There is a need for robust evaluation methods that consider patient safety and clinical relevance.
Purpose Of The Study
- To develop a clinically informed evaluation framework for VLM-generated radiology reports.
- To introduce a novel risk-aware metric for assessing the safety impact of AI-generated reports.
- To analyze the performance and identify vulnerabilities of current leading VLMs in radiology.
Main Methods
- Defined a taxonomy of 12 radiology-specific error types, annotated with clinical risk levels (low, medium, high) by physicians.
- Conducted a comprehensive error analysis of three VLMs (DeepSeek VL2, CXR-LLaVA, CheXagent) on 685 expert-annotated MIMIC-CXR cases.
- Introduced the Clinical Risk-weighted Error Score for Text-generation (CREST) metric to quantify safety impact.
Main Results
- Identified critical model vulnerabilities and common error patterns across evaluated VLMs.
- Revealed condition-specific risk profiles associated with different types of errors.
- Demonstrated the limitations of traditional NLP metrics in capturing clinically significant inaccuracies.
Conclusions
- The proposed framework provides a safety-centric foundation for evaluating medical report generation models.
- Findings offer actionable insights for developing and deploying more reliable AI tools in radiology.
- The CREST metric and error taxonomy can guide future improvements in VLM performance for clinical applications.
Related Concept Videos
Radiological investigations, including X-rays and computed tomography (CT) scans, are critical for diagnosing and evaluating various medical conditions. These imaging techniques provide valuable insights into the body's internal structures, aiding in the detection of abnormalities, assessment of disease progression, and development of treatment strategies. This article delves into two primary radiological investigations, chest X-rays and CT scans, outlining their purpose, procedures, and...
Description
Magnetic Resonance Imaging (MRI) and Ventilation Perfusion Scans are two radiological investigations that offer detailed diagnostic images of the body, particularly lung structures.
MRI
MRI uses magnetic fields and radiofrequency signals to distinguish between normal and abnormal tissues. This technology provides a more detailed diagnostic image than CT scans, enabling it to characterize pulmonary nodules, stage bronchogenic carcinoma, and evaluate inflammatory activity in...

