Evaluation of AI models for radiology exam preparation: DeepSeek vs. ChatGPT-3.5
View abstract on PubMed
Summary
This summary is machine-generated.DeepSeek-V3, an AI chatbot, significantly outperformed ChatGPT-3.5 in answering radiology board-style questions, showing promise for medical education. Further refinement is needed for complex tasks, but it excels in foundational knowledge recall.
Area Of Science
- Artificial Intelligence in Medical Education
- Radiology Training and Assessment
- Large Language Models in Healthcare
Background
- AI chatbots are rapidly advancing, with growing interest in their application in medical education.
- Assessing the efficacy of AI models in specialized fields like radiology is crucial for educational integration.
Purpose Of The Study
- To evaluate the performance of the open-source AI chatbot DeepSeek-V3 on radiology board-style questions.
- To compare DeepSeek-V3's accuracy against ChatGPT-3.5 for radiology education.
Main Methods
- 161 radiology board-style questions (207 items) were selected from a qualification examination.
- DeepSeek-V3 and ChatGPT-3.5 were tested on the same question set over seven days.
- Accuracy was assessed using Pearson's chi-square and Fisher's exact tests.
Main Results
- DeepSeek-V3 achieved 72% accuracy, significantly higher than ChatGPT-3.5's 55.6% (P < 0.001).
- DeepSeek-V3 excelled in single-choice questions (87.1%) but showed lower performance in multiple-choice (55.7%) and case analysis (68.0%).
- DeepSeek-V3 outperformed ChatGPT-3.5 across multiple clinical subspecialties, including nervous, respiratory, circulatory, and musculoskeletal systems.
Conclusions
- DeepSeek-V3 shows potential as an AI tool for radiology education, particularly for foundational knowledge and recall.
- The model requires further development to improve performance on higher-order cognitive tasks and complex question formats.
- Future research should explore DeepSeek-V3's image-based question capabilities and compare it with more advanced AI models.

