Artificial intelligence-large language models (AI-LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice
View abstract on PubMed
Summary
This summary is machine-generated.ChatGPT-4o demonstrated superior performance in cardiotocography interpretation, outperforming other AI-LLMs and junior doctors. This advanced AI tool closely matched senior doctor performance, showing promise for improving obstetric care.
Area Of Science
- Medical Imaging
- Artificial Intelligence
- Obstetrics
Background
- Accurate cardiotocography (CTG) interpretation is crucial for monitoring fetal well-being during pregnancy and labor.
- Advanced artificial intelligence (AI) tools, specifically AI-large language models (AI-LLMs), have the potential to improve CTG interpretation accuracy.
- The clinical utility and performance of AI-LLMs in CTG interpretation require thorough evaluation.
Purpose Of The Study
- To assess the performance of three AI-LLMs (ChatGPT-4o, Gemini Advanced, Copilot) in interpreting CTG images.
- To compare the AI-LLMs' interpretations with those of junior (JHDs) and senior human doctors (SHDs).
- To evaluate the reliability of AI-LLMs in clinical decision-making for obstetric care.
Main Methods
- Seven CTG images were interpreted by three AI-LLMs, five SHDs, and five JHDs.
- Expert evaluations were conducted by five blinded maternal-fetal medicine specialists using a Likert scale for relevance, clarity, depth, focus, and coherence.
- Statistical analysis was performed to compare the homogeneity of expert ratings and group performances.
Main Results
- ChatGPT-4o achieved a score of 77.86, outperforming Gemini Advanced (57.14), Copilot (47.29), and JHDs (61.57).
- ChatGPT-4o's performance (77.86) closely approached that of SHDs (80.43), with no statistically significant difference (p > 0.05).
- ChatGPT-4o particularly excelled in the depth parameter and was only marginally inferior to SHDs in other assessed parameters.
Conclusions
- ChatGPT-4o demonstrated superior performance among evaluated AI-LLMs in CTG interpretation.
- ChatGPT-4o surpassed the performance of junior doctors and closely matched senior doctors.
- AI-LLMs, exemplified by ChatGPT-4o, show significant potential as assistive tools for obstetricians, aiming to enhance diagnostic accuracy and improve obstetric patient care.

