Comparative analysis of GPT-4 and Google Gemini's consistency with pediatric otolaryngology guidelines
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) like GPT-4 and Google Gemini show high accuracy and completeness in interpreting pediatric otolaryngology guidelines, serving as potential assistive tools for healthcare professionals.
Area Of Science
- Medical Informatics
- Artificial Intelligence in Medicine
- Otolaryngology
Background
- Clinical practice guidelines (CPGs) are essential for evidence-based medicine.
- Large language models (LLMs) are increasingly explored for healthcare applications.
- Accurate interpretation of CPGs is crucial for patient care.
Purpose Of The Study
- To assess the accuracy and completeness of LLMs in understanding pediatric otolaryngology guidelines.
- To compare the performance of GPT-4 and Google Gemini in this domain.
- To evaluate the potential of AI as a supportive tool in clinical decision-making.
Main Methods
- GPT-4 and Google Gemini responses to queries based on AAO-HNSF guidelines were analyzed.
- Two independent reviewers rated accuracy (1-5) and completeness (1-3).
- Statistical analysis included inter-rater reliability (Cohen's kappa) and model comparison (Wilcoxon Signed-Rank Test).
Main Results
- Both LLMs achieved high accuracy (GPT-4: 4.74, Gemini: 4.82) and completeness (GPT-4: 2.94, Gemini: 2.98).
- No statistically significant differences were observed between the models for accuracy (p=0.134) or completeness (p=0.34).
- AI responses highlighted patient-specific factors and the need for professional consultation.
Conclusions
- GPT-4 and Google Gemini show promise as assistive tools in pediatric otolaryngology.
- Limitations include reliance on pre-trained data and subjective evaluation.
- Continuous AI development and integration alongside human expertise are vital for clinical use.

