ChatGPT's diagnostic performance based on textual vs. visual information compared to radiologists' diagnostic performance in musculoskeletal radiology
View abstract on PubMed
Summary
This summary is machine-generated.Generative Pre-trained Transformer (GPT)-4-based ChatGPT showed diagnostic accuracy comparable to radiology residents in musculoskeletal radiology, outperforming GPT-4V but not board-certified radiologists.
Area Of Science
- Artificial intelligence in medical diagnostics
- Musculoskeletal radiology
- Large language models
Background
- Generative Pre-trained Transformer (GPT)-4-based ChatGPT and GPT-4 with vision (GPT-4V) are advanced AI models with potential applications in medical diagnosis.
- Assessing the diagnostic accuracy of these AI models in specialized fields like musculoskeletal radiology is crucial for understanding their clinical utility.
Purpose Of The Study
- To compare the diagnostic accuracy of GPT-4-based ChatGPT, GPT-4V-based ChatGPT, and human radiologists in musculoskeletal radiology.
- To evaluate the performance of AI models against different levels of human expertise.
Main Methods
- 106 "Test Yourself" cases from Skeletal Radiology were used.
- GPT-4-based ChatGPT received medical history and imaging findings; GPT-4V-based ChatGPT received medical history and images.
- Diagnoses from both AI models were compared against ground truth and independently assessed diagnoses from a radiology resident and a board-certified radiologist.
Main Results
- GPT-4-based ChatGPT achieved 43% accuracy, significantly outperforming GPT-4V-based ChatGPT (8%) (p < 0.001).
- A radiology resident achieved 41% accuracy, and a board-certified radiologist achieved 53% accuracy.
- GPT-4-based ChatGPT's accuracy was comparable to the resident (p = 0.78) but lower than the board-certified radiologist (p = 0.22).
Conclusions
- GPT-4-based ChatGPT demonstrates superior diagnostic performance compared to GPT-4V-based ChatGPT in musculoskeletal radiology.
- While comparable to radiology residents, GPT-4-based ChatGPT did not reach the diagnostic accuracy of board-certified radiologists.
- For optimal utilization, radiologists should input detailed descriptions of imaging findings into ChatGPT rather than the images themselves.

