Concordance between artificial intelligence and radiologists in BIRADS classification of breast ultrasound: A study using ChatGPT-4o
View abstract on PubMed
Summary
This summary is machine-generated.ChatGPT-4o shows moderate agreement with radiologists in breast ultrasound image analysis. While promising for clear cases, it struggles with intermediate risks and artifacts, suggesting it as a supplementary tool.
Area Of Science
- Artificial Intelligence in Medical Imaging
- Radiology and Diagnostic Imaging
- Breast Ultrasound Interpretation
Background
- Large language models (LLMs) are increasingly explored for medical image analysis.
- Evaluating AI diagnostic performance against human experts is crucial for clinical integration.
- Breast imaging relies on standardized classification systems like BI-RADS.
Purpose Of The Study
- To assess ChatGPT-4o's diagnostic concordance in assigning BI-RADS categories on breast ultrasound images.
- To compare ChatGPT-4o's performance with experienced radiologists.
- To evaluate the consistency of AI in breast lesion classification.
Main Methods
- Retrospective analysis of 405 breast ultrasound images from 350 patients.
- Independent review and BI-RADS categorization by two experienced radiologists.
- Evaluation of the same images by ChatGPT-4o using a standardized prompt.
- Statistical analysis using Cohen's kappa and Fleiss' kappa to measure agreement.
Main Results
- High interobserver agreement between radiologists (Cohen's κ = 0.832).
- ChatGPT-4o demonstrated moderate to substantial agreement with radiologists (κ = 0.593–0.621).
- Highest concordance for BI-RADS 1 and 5; lower agreement for BI-RADS 3.
- Overall agreement among radiologists and AI was substantial (Fleiss' κ = 0.682).
- AI occasionally upstaged BI-RADS 3 cases and misclassified anatomical structures.
Conclusions
- ChatGPT-4o shows potential in interpreting breast ultrasound, especially for distinct benign and malignant findings.
- Limitations exist in classifying intermediate-risk lesions and interpreting artifacts.
- Current AI performance suggests its use as an adjunct to, not a replacement for, expert radiologists.

