Assessing the validity of ChatGPT-4o and Google Gemini Advanced when responding to frequently asked questions in endodontics
View abstract on PubMed
Summary
This summary is machine-generated.Artificial intelligence (AI) chatbots like ChatGPT-4o and Google Gemini Advanced (GGA) show high validity for endodontic information under lenient criteria. However, their reliability decreases significantly with stricter evaluation, requiring professional oversight for patient education.
Area Of Science
- Dental informatics
- Artificial intelligence in healthcare
- Endodontic information dissemination
Background
- Large language models (LLMs) are increasingly used for patient information in endodontics.
- Continuous evaluation of AI model responses is crucial as new versions are released.
Purpose Of The Study
- To assess the validity of responses from advanced LLMs (Google Gemini Advanced and ChatGPT-4o) to common endodontic questions.
Main Methods
- A cross-sectional study compiled the top 20 endodontic FAQs.
- Nine endodontic specialists evaluated LLM responses using a five-point Likert scale.
- Validity was assessed against high (4.5-5) and low (≥4) thresholds.
Main Results
- Both models achieved 95% validity at the low threshold (≥4).
- At the high threshold (4.5-5), ChatGPT-4o showed 35% validity, and GGA showed 40% validity.
Conclusions
- LLM responses demonstrate high validity under lenient criteria but are less reliable under strict evaluation.
- While promising for patient education, AI chatbots require professional monitoring due to current validity limitations in endodontics.

