A Comparative Analysis of GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, and Gemini 1.5 in Answering Total Knee Replacement-Related Questions
View abstract on PubMed
Summary
This summary is machine-generated.Five AI chatbots were evaluated for total knee replacement (TKR) information accuracy. GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 provided accurate responses, while Gemini Advanced showed lower performance.
Area Of Science
- Orthopaedic Surgery
- Artificial Intelligence in Medicine
- Medical Information Systems
Background
- Artificial intelligence (AI) chatbots are increasingly utilized for medical information dissemination.
- Systematic evaluations of AI chatbot accuracy and reliability in orthopaedic surgery, specifically for total knee replacement (TKR), are limited.
Purpose Of The Study
- To systematically compare and evaluate the performance of various AI chatbots.
- To assess the accuracy and reliability of AI chatbot responses concerning TKR.
Main Methods
- 43 TKR-related frequently asked questions (FAQs) were curated.
- Five AI chatbot models (GPT-3.5, GPT-4, GPT-4 Omni, Gemini Advanced, Gemini 1.5) were used to generate responses.
- Two orthopaedic surgeons evaluated response accuracy and relevance using a 5-point Likert scale.
Main Results
- GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 demonstrated high accuracy (≥4.8/5).
- Gemini Advanced scored significantly lower in accuracy (4.1/5) and relevance (83.7%).
- No significant accuracy differences were found for general information, risks, pain, and postoperative activities among most chatbots.
Conclusions
- GPT-3.5, GPT-4, GPT-4 Omni, and Gemini 1.5 are reliable sources for TKR-related queries.
- Gemini Advanced demonstrated underperformance in accuracy and relevance for TKR information.

