Artificial intelligence in thoracic surgery consultations: evaluating the concordance between a large language model and expert clinical decisions
View abstract on PubMed
Summary
This summary is machine-generated.Artificial intelligence demonstrated moderate agreement with thoracic surgeon decisions, particularly in oncology cases. AI shows potential to aid outpatient workflows but should complement, not replace, expert clinical judgment.
Area Of Science
- Medical Informatics
- Artificial Intelligence in Medicine
- Thoracic Surgery
Background
- The integration of artificial intelligence (AI) and large language models (LLMs) into clinical settings is growing.
- However, their specific application in thoracic surgery decision-making requires further investigation.
Purpose Of The Study
- To evaluate the concordance between AI-generated recommendations (Scholar GPT) and decisions made by board-certified thoracic surgeons.
- To assess the potential utility of AI in thoracic surgery outpatient consultations.
Main Methods
- A retrospective observational study analyzed 81 outpatient consultations in thoracic surgery.
- Scholar GPT's diagnostic and therapeutic recommendations were compared against surgeon decisions using a 6-point concordance scale.
- Statistical analysis included descriptive statistics, t-tests/ANOVA, and chi-square tests.
Main Results
- The mean concordance score between Scholar GPT and surgeons was 3.67 ± 1.17.
- High concordance (scores 4-5) was observed in 56.8% of cases, especially for oncological diagnoses like mediastinal and pleural tumors.
- Lower concordance was noted in complex cases such as metastatic lung disease and thoracic outlet syndrome.
Conclusions
- Scholar GPT shows promise in aligning with surgeon decisions for structured oncologic cases but exhibits variability in complex scenarios.
- AI tools may assist in streamlining outpatient workflows but should be used as a supplement to, not a replacement for, expert clinical judgment.
- Findings are exploratory, warranting caution due to the study's small sample size and single-center, short-duration design.

