Abstract
BACKGROUND
Osteoporosis is a sex-specific disease. Postmenopausal osteoporosis (PMOP) has been the focus of public health research worldwide. The purpose of this study is to evaluate the quality and readability of artificial intelligence large-scale language models (AI-LLMs): ChatGPT-4o mini, ChatGPT-4o and Gemini Advanced for responses generated in response to questions related to PMOP.
METHODS
We collected 48 PMOP frequently asked questions (FAQs) through offline counseling and online medical community forums. We also prepared 24 specific questions about PMOP based on the Management of Postmenopausal Osteoporosis: 2022 ACOG Clinical Practice Guideline No. 2 (2022 ACOG-PMOP Guideline). In this project, the FAQs were imported into the AI-LLMs (ChatGPT-4o mini, ChatGPT-4o, Gemini Advanced) and randomly assigned to four professional orthopedic surgeons, who independently rated the satisfaction of each response via a 5-point Likert scale. Furthermore, a Flesch Reading Ease (FRE) score was calculated for each of the LLMs' responses to assess the readability of the text generated by each LLM.
RESULTS
When it comes to addressing questions related to PMOP and the 2022 ACOG-PMOP guidelines, ChatGPT-4o and Gemini Advanced provide more concise answers than ChatGPT-4o mini. In terms of the overall FAQs of PMOP, ChatGPT-4o has a significantly higher accuracy rate than ChatGPT-4o mini and Gemini Advanced. When answering questions related to the 2022 ACOG-PMOP guidelines, ChatGPT-4o mini vs. ChatGPT-4o have significantly higher response accuracy than Gemini Advanced. ChatGPT-4o mini, ChatGPT-4o, and Gemini Advanced all have good levels of self-correction.
CONCLUSIONS
Our research shows that Gemini Advanced and ChatGPT-4o provide more concise and intuitive answers. ChatGPT-4o responds better in answering frequently asked questions related to PMOP. When answering questions related to the 2022 ACOG-PMOP guidelines, ChatGPT-4o mini and ChatGPT-4o responded significantly better than Gemini Advanced. ChatGPT-4o mini, ChatGPT-4o, and Gemini Advanced have demonstrated a strong ability to self-correct.
CLINICAL TRIAL NUMBER
Not applicable.