Confirmation of Large Language Models in Head and Neck Cancer Staging

  • 0Department of Medical Oncology, Ankara University Faculty of Medicine, Ankara University, Ankara 06620, Turkey.

|

|

Summary

This summary is machine-generated.

Large language models (LLMs) show high accuracy in staging head and neck cancer (HNC). These AI tools, including ChatGPT, DeepSeek, and Grok, could become valuable in oncology with further study.

Area Of Science

  • Oncology
  • Artificial Intelligence
  • Medical Informatics

Background

  • Head and neck cancer (HNC) staging is crucial for treatment and prognosis.
  • Large language models (LLMs) are emerging AI tools with potential applications in oncology.
  • The clinical utility of LLMs for HNC staging requires evaluation.

Purpose Of The Study

  • To assess the accuracy and concordance of LLM-generated staging for HNC.
  • To compare LLM staging performance against clinician-assigned stages.

Main Methods

  • Retrospective review of 202 HNC patient records.
  • Clinician staging performed by researchers.
  • LLM staging (ChatGPT, DeepSeek, Grok) conducted by a blinded researcher using de-identified data.
  • Comparison of LLM and clinician staging outcomes.

Main Results

  • ChatGPT achieved the highest concordance (85.6%), followed by Grok (75.2%) and DeepSeek (67.3%).
  • No statistically significant differences were observed between the LLM models.
  • Concordance was similar across different data inputs (imaging, pathology, physical exam) and staging types (pathological, surgical).

Conclusions

  • LLMs demonstrate significant accuracy in staging HNC.
  • These AI models show promise as supportive tools in oncological practice.
  • Further prospective studies are recommended for clinical implementation.