Is ChatGPT 3.5 smarter than Otolaryngology trainees? A comparison study of board style exam questions
View abstract on PubMed
Summary
This summary is machine-generated.Artificial intelligence (AI) platform ChatGPT outperformed early-level medical trainees on Otolaryngology board exams. However, ChatGPT did not surpass higher-level trainees, indicating limited clinical utility in its current form.
Area Of Science
- Medical Education
- Artificial Intelligence in Medicine
- Otolaryngology
Background
- The integration of artificial intelligence (AI) into medical education and practice is rapidly evolving.
- Assessing the capabilities of AI platforms like ChatGPT against human expertise is crucial for understanding their potential roles.
Purpose Of The Study
- To compare the performance of ChatGPT (version 3.5) with Otolaryngology trainees at various educational levels on board-style examination questions.
- To evaluate the relationship between trainee's educational level and their performance relative to ChatGPT.
Main Methods
- A set of 30 Otolaryngology board-style questions was administered to 31 medical students (MS) and 17 Otolaryngology residents (OR).
- ChatGPT (version 3.5) completed the same test five times.
- Performance comparisons were conducted using one-way ANOVA with Tukey Post Hoc test and regression analysis.
Main Results
- Average scores increased progressively from MS1 to PGY5.
- ChatGPT outperformed MS1, MS2, and MS3 trainees (p < 0.001 to 0.019).
- PGY4 and PGY5 residents significantly outperformed ChatGPT (p = 0.033 and 0.002, respectively), while no significant difference was found for MS4, PGY1, PGY2, and PGY3 trainees.
Conclusions
- ChatGPT demonstrates proficiency in Otolaryngology board-style questions, surpassing junior trainees.
- Higher-level trainees possess superior knowledge and application skills compared to ChatGPT, particularly in complex medical reasoning.
- Current AI models like ChatGPT may have limited clinical utility for Otolaryngologists due to the need for advanced clinical judgment beyond rote memorization.

