Evaluating ChatGPT's Diagnostic Accuracy in Detecting Fundus Images
View abstract on PubMed
Summary
This summary is machine-generated.ChatGPT shows potential for diagnosing retinal conditions from fundus images, accurately identifying 4 out of 12 diseases. However, current AI diagnostic accuracy is insufficient for clinical use due to frequent hallucinations.
Area Of Science
- Ophthalmology
- Artificial Intelligence
- Medical Imaging
Background
- Artificial intelligence (AI) is increasingly utilized in healthcare, particularly in ophthalmology for image analysis.
- Large language models like ChatGPT are expanding into image analysis, presenting new diagnostic opportunities.
- Limited research exists on the diagnostic accuracy of AI in interpreting retinal fundus images.
Purpose Of The Study
- To evaluate the diagnostic accuracy of ChatGPT 4.0 in identifying retinal diseases from fundus images.
- To assess the potential of large language models in ophthalmological diagnostics.
Main Methods
- Twelve fundus images representing key ophthalmological diseases were selected.
- ChatGPT 4.0 was prompted to diagnose each image.
- Model diagnoses were compared against confirmed conditions to determine accuracy.
Main Results
- ChatGPT accurately diagnosed 4 out of 12 conditions, including papilloedema, dry age-related macular degeneration (ARMD), glaucoma, and vitreous hemorrhage.
- A partial diagnosis was achieved for diabetic retinopathy.
- The model struggled with 7 conditions, including various retinal detachments and occlusions, and frequently hallucinated when uncertain.
Conclusions
- ChatGPT demonstrates preliminary potential for diagnosing retinal conditions via fundus photography.
- Current AI diagnostic accuracy is insufficient for clinical application due to unreliability and hallucinations.
- Further research and model refinement are necessary to enhance AI's diagnostic capabilities in ophthalmology.

