Solving Complex Pediatric Surgical Case Studies: A Comparative Analysis of Copilot, ChatGPT-4, and Experienced Pediatric Surgeons' Performance
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) like ChatGPT-4 and Copilot show limited accuracy in pediatric surgery diagnosis and clinical questions. While AI has potential, human surgeons still outperform these advanced models.
Area Of Science
- Medical Artificial Intelligence
- Pediatric Surgical Diagnostics
- Clinical Decision Support Systems
Background
- Large language models (LLMs) are advancing rapidly across sectors, including medicine.
- The application and efficacy of LLMs in pediatric surgery are not well-established.
- Assessing AI capabilities in complex pediatric surgical cases is crucial for future integration.
Purpose Of The Study
- To evaluate the diagnostic and clinical question-answering abilities of ChatGPT-4 and Microsoft Copilot in pediatric surgery.
- To compare the performance of these artificial intelligence (AI) models against experienced pediatric surgeons.
- To assess the completeness and accuracy of AI-generated diagnostic recommendations.
Main Methods
- Utilized 13 complex clinical case vignettes of classic pediatric surgical diseases.
- Compared AI model responses (ChatGPT-4, Copilot) with those of a cohort of pediatric surgeons.
- Pediatric surgeons rated AI diagnostic recommendations; statistical analyses determined performance differences.
Main Results
- Pediatric surgeons achieved the highest performance score (68.8%), followed by ChatGPT-4 (52.1%) and Copilot (47.9%).
- Statistically significant performance differences were observed between AI models and human surgeons (p < 0.01).
- ChatGPT-4 showed superior differential diagnosis generation compared to Copilot (p < 0.05); AI recommendations were rated as average.
Conclusions
- Current AI models demonstrate significant limitations in accuracy and reliability for pediatric surgical clinical decision-making.
- While LLMs show promise, their current capabilities do not match experienced human surgeons in this specialized field.
- Further research and development are necessary to enhance AI performance and validate its clinical utility in pediatric surgery.

