Multidimensional assessment of large language model responses to patient questions on gestational diabetes mellitus
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) show varied performance in educating patients about gestational diabetes mellitus (GDM). Grok and Gemini offered higher quality responses, but all models require improvement for patient readability.
Area Of Science
- Medical Informatics
- Artificial Intelligence in Healthcare
- Patient Education
Background
- Gestational diabetes mellitus (GDM) is common and requires clear patient education.
- The reliability and readability of large language models (LLMs) for GDM patient education are not well-established.
- Accurate health information is crucial for managing GDM effectively.
Purpose Of The Study
- To evaluate the performance of four leading LLMs (ChatGPT-4o, Gemini 2.5 Pro, Grok 3.0, DeepSeek R-1) in providing patient-oriented education on GDM.
- To assess the quality, readability, and lexical diversity of LLM-generated responses to clinical GDM scenarios.
- To identify potential areas for LLM improvement in health information delivery.
Main Methods
- Utilized 25 patient-oriented GDM questions derived from clinical scenarios.
- Evaluated responses from four LLMs: ChatGPT-4o, Gemini 2.5 Pro, Grok 3.0, and DeepSeek R-1.
- Assessed response quality using modified DISCERN (mDISCERN) and Global Quality Score (GQS) by seven endocrinologists.
- Analyzed readability with Flesch Reading Ease (FRES), Flesch-Kincaid Grade Level (FKGL), Gunning Fog Index (GFI), Coleman-Liau Index (CLI), and SMOG.
- Measured lexical diversity using type-token ratio (TTR).
Main Results
- Grok and Gemini achieved the highest expert-rated quality scores (mDISCERN and GQS).
- ChatGPT-4o performed significantly lower in quality compared to Grok and Gemini (p < 0.05).
- DeepSeek provided the most readable content, while Grok's responses were the longest and most complex.
- All LLMs produced content below the recommended FRES threshold (60) for lay audiences.
- Response length positively correlated with quality; lexical diversity was inversely related to quality but positively associated with readability.
Conclusions
- Significant variability exists in the performance of LLMs for GDM patient education.
- Grok and Gemini demonstrate higher potential for reliable GDM information delivery among the tested models.
- Improvements in readability and content complexity are needed across all LLMs to meet patient comprehension standards.
- Model-specific optimization is essential to ensure LLMs provide safe and effective health information for conditions like GDM.
Related Concept Videos
Type 2 diabetes, characterized by insulin resistance, arises when the insulin receptors on cells lose responsiveness to insulin, diminishing the cell's capacity to take up glucose, resulting in elevated blood glucose levels. To receive a diagnosis of Type 2 diabetes, a series of blood glucose tests are necessary to assess whether the blood glucose falls within normal parameters. If the result is out of the normal range, a patient may be diagnosed as prediabetic or diabetic, depending on the...
SBAR is an effective communication tool used by healthcare professionals to communicate patient information accurately. SBAR stands for Situation, Background, Assessment, and Recommendation. For a better understanding, an example is given below.
SBAR Report from a Nurse to a Health Care Provider
S: "Hello, Dr. Smith. This is Jane, RN, from the Med Surg unit. I am calling to tell you about Ms. White in Room 210, who is experiencing increased pain and redness at her incision site. Her recent...
Diabetes mellitus is a chronic metabolic disorder characterized by hyperglycemia. The four categories of diabetes are type 1 diabetes, type 2 diabetes, other specific types of diabetes, and gestational diabetes.
Type 1 diabetes is characterized by autoimmune-mediated destruction of pancreatic β cells, with environmental factors potentially triggering this process in genetically susceptible individuals. Despite many not having a family history, certain genes increase susceptibility,...
Assessing the gastrointestinal (GI) system is a complex process that begins with collecting subjective data. This data, collected through patient interviews, provides crucial insights into the patient's health history, perception patterns, and lifestyle habits, all contributing significantly to GI health.
Health Perception Patterns
Health perception patterns offer valuable insights into a patient's lifestyle habits and how they may impact their GI health. These patterns include:
...
Diabetes mellitus is a chronic metabolic disorder characterized by high blood glucose levels due to inadequate insulin production, insulin resistance, or both. The condition affects millions worldwide and can significantly impact their health and quality of life.
Type 1 diabetes is an autoimmune disease in which the immune system mistakenly attacks and destroys the insulin-producing beta cells in the pancreas. As a result, the body is unable to produce sufficient insulin, and individuals with...
Biguanides, particularly metformin (Glucophage), are insulin sensitizers that enhance glucose uptake, thereby reducing insulin resistance. Unlike sulfonylureas, metformin doesn't prompt insulin secretion, which helps to curb hypoglycemia risk. Metformin is beneficial in treating conditions like polycystic ovary syndrome due to its insulin-resistance reduction capability. The drug's primary action involves curtailing hepatic gluconeogenesis, a significant contributor to high blood...

