ChatGPT versus NASS clinical guidelines for degenerative spondylolisthesis: a comparative analysis
View abstract on PubMed
Summary
This summary is machine-generated.ChatGPT showed variable accuracy in providing clinical recommendations for degenerative spondylolisthesis, aligning with guidelines in some areas but being overly conclusive or inaccurate in others. Clinicians must use caution when consulting AI tools for medical advice.
Area Of Science
- Artificial Intelligence in Medicine
- Spine Surgery
- Clinical Decision Support
Background
- Clinical guidelines aid surgeon decision-making, with AI and large language models (LLMs) showing potential in healthcare.
- OpenAI's ChatGPT can synthesize medical literature, offering a potential tool for clinical decision-making in spine care.
- Limited research exists on ChatGPT's utility for degenerative spondylolisthesis clinical decision support.
Purpose Of The Study
- To compare ChatGPT's recommendations against The North American Spine Society (NASS) guidelines for degenerative spondylolisthesis.
- To evaluate ChatGPT's accuracy in the context of current medical literature.
Main Methods
- ChatGPT-3.5 and 4.0 were prompted with questions from the NASS guideline on degenerative spondylolisthesis.
- Responses were graded as 'concordant' or 'nonconcordant' with NASS recommendations.
- Nonconcordant responses were further categorized as 'Insufficient' or 'Over-conclusive'; GPT-3.5 and 4.0 results were compared using Chi-squared tests.
Main Results
- ChatGPT-3.5 achieved 46.4% concordance with NASS guidelines, with higher accuracy on defined recommendations (66.7%) versus undefined areas (36.8%).
- ChatGPT-4.0 demonstrated improved concordance at 67.9%, with similar performance for defined (66.7%) and undefined (68.4%) recommendations.
- Nonconcordant responses from ChatGPT-3.5 were predominantly 'over-conclusive' (80%).
Conclusions
- LLMs like ChatGPT present a duality of utility and inaccuracy in clinical settings.
- ChatGPT generally aligns with NASS recommendations where evidence is clear but struggles with areas lacking defined best practices.
- Clinicians must exercise caution, verifying AI-generated recommendations against current literature due to potential inaccuracies and fabricated information.

