Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons?
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) show promise in improving neurosurgical differential diagnoses. While accuracy varies, LLMs can assist in identifying conditions like epilepsy, potentially reducing diagnostic delays.
Area Of Science
- Medical Informatics
- Artificial Intelligence in Medicine
- Neurosurgery
Background
- Accurate differential diagnoses are critical in neurosurgery.
- Diagnostic delays in neurosurgery lead to significant health and economic challenges.
- Large language models (LLMs) are emerging as potential tools in healthcare.
Purpose Of The Study
- To evaluate the role of LLMs in assisting neurosurgeons with differential diagnoses.
- To assess the diagnostic accuracy of various LLMs in neurosurgical cases.
Main Methods
- Utilized three chat-based LLMs: ChatGPT (3.5 and 4.0), Perplexity AI, and Bard AI.
- Prompted LLMs with clinical vignettes for 20 neurosurgical disorders.
- Determined LLM accuracy based on correct identification of the target disease within top differentials.
Main Results
- ChatGPT 3.5 and 4.0 showed initial accuracies of 52.63% and 53.68%, respectively.
- Perplexity AI and Bard AI achieved 40.00% and 29.47% accuracy.
- ChatGPT 3.5 reached 77.89% accuracy for the top 5 differentials; Bard AI improved to 62.11% in the top 5.
- LLMs performed well on common conditions like epilepsy but struggled with complex diseases (e.g., Moyamoya disease).
Conclusions
- LLMs demonstrate potential to enhance diagnostic accuracy in neurosurgery.
- These AI tools may help decrease the incidence of missed diagnoses.

