Utility of Large Language Models for Health Care Professionals and Patients in Navigating Hematopoietic Stem Cell Transplantation: Comparison of the Performance of ChatGPT-3.5, ChatGPT-4, and Bard
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) show potential in explaining complex topics like hematopoietic stem cell transplantation (HSCT). However, inaccuracies and unverified sources mean LLMs are not yet ready for unsupervised clinical use or patient advice.
Area Of Science
- Artificial Intelligence in Healthcare
- Natural Language Processing Applications
- Medical Information Dissemination
Background
- Artificial intelligence (AI) and large language models (LLMs) are increasingly integrated into various workflows.
- LLMs offer potential for delivering health information to both providers and patients.
- Hematopoietic stem cell transplantation (HSCT) is a complex medical field with a challenging knowledge base for non-specialists.
Purpose Of The Study
- To evaluate the applicability of prominent LLMs (ChatGPT-3.5, ChatGPT-4, Bard) for guiding healthcare professionals and informing patients about HSCT.
- To assess LLM performance in terms of response consistency, veracity, comprehensibility, specificity, and hallucination detection.
Main Methods
- 72 open-ended HSCT questions were posed to three LLMs.
- Responses were evaluated for consistency, veracity, comprehensibility, specificity, and hallucinations.
- Top-performing LLMs were re-challenged with difficult questions, prompted for audience-specific language and verifiable sources.
Main Results
- ChatGPT-4 demonstrated superior consistency, veracity, and specificity compared to ChatGPT-3.5 and Bard.
- ChatGPT-3.5 and ChatGPT-4 showed higher language comprehensibility than Bard.
- All LLMs exhibited hallucinations; none consistently provided accurate, verifiable sources, hindering clinical applicability.
Conclusions
- Current LLMs are not suitable for unsupervised clinical use or patient counseling in complex fields like HSCT due to errors and unreliable references.
- Future applications may be enabled by LLMs trained on specialized datasets and improved capabilities for accessing and citing current research.

