Multimodal Performance of GPT-4 in Complex Ophthalmology Cases
View abstract on PubMed
Summary
This summary is machine-generated.GPT-4's diagnostic accuracy in ophthalmology decreases with images alone, but improves with descriptions. It shows potential as an assistive tool, comparable to human experts in some reasoning tasks.
Area Of Science
- Artificial Intelligence in Medicine
- Ophthalmology
- Multimodal AI
Background
- Multimodal capabilities in GPT-4 offer advancements in artificial intelligence for ophthalmology.
- The utility of GPT-4 for complex diagnostic and reasoning tasks in ophthalmology is not fully understood.
- Evaluating AI performance against human expertise is crucial for clinical integration.
Purpose Of The Study
- To assess GPT-4's multimodal performance on diagnostic and next-step reasoning in complex ophthalmology cases.
- To compare GPT-4's accuracy with board-certified ophthalmologists across different input modalities.
- To identify limitations and potential applications of multimodal AI in ophthalmic diagnostics.
Main Methods
- GPT-4 was evaluated on three arms: text with figure descriptions, text with figures, and figures only.
- Performance was measured by diagnostic and next-step accuracy in complex ophthalmology cases.
- GPT-4's results were benchmarked against three board-certified ophthalmologists.
Main Results
- GPT-4 achieved 38.4% diagnostic accuracy and 57.8% next-step accuracy with figures only.
- Diagnostic accuracy decreased significantly with figures alone compared to text-only prompts (p=0.007).
- Adding figure descriptions improved diagnostic accuracy to 49.3%, comparable to text-only prompts.
Conclusions
- GPT-4's diagnostic performance declines when relying solely on ophthalmic images, indicating current multimodal limitations.
- GPT-4 demonstrated comparable diagnostic and next-step reasoning performance to at least one ophthalmologist.
- GPT-4 shows promise as an assistive tool in ophthalmology, with future research focusing on prompt optimization.

