A comparative evaluation of publicly available large language models in the assessment of CTG traces according to the FIGO criteria
View abstract on PubMed
Summary
This summary is machine-generated.Large language models (LLMs) struggle with interpreting cardiotocography (CTG) traces for fetal monitoring. Current AI tools show significant limitations in accurately classifying CTG results according to established medical criteria.
Area Of Science
- Medical Technology
- Artificial Intelligence in Healthcare
- Fetal Monitoring
Background
- Cardiotocography (CTG) is vital for fetal surveillance but suffers from interpretation variability.
- Artificial intelligence (AI), specifically large language models (LLMs), presents an opportunity to enhance diagnostic consistency and reduce clinician burden.
Purpose Of The Study
- To assess and compare the diagnostic accuracy of multiple LLMs for cardiotocography interpretation.
- Evaluation based on the Federation of Gynecology and Obstetrics (FIGO 2015) criteria.
Main Methods
- Sixty cardiotocography (CTG) traces, pre-classified by expert clinicians, were analyzed.
- Four LLMs (Chat-GPT-4.0, Google Gemini, Bing Copilot, DeepSeek) were tested in a two-phase protocol using normal and abnormal CTG traces presented as screenshots.
Main Results
- DeepSeek and Google Gemini demonstrated inability or poor performance in CTG interpretation.
- Chat-GPT-4.0 showed partial success, while Bing Copilot accurately interpreted normal CTGs but failed on abnormal ones.
Conclusions
- Current large language models exhibit significant limitations in reliably interpreting cardiotocography (CTG) traces.
- Further development is needed for AI tools to meet clinical standards for fetal monitoring interpretation.
Related Concept Videos
Multiple comparison test, abbreviated as MCT, is a post hoc analysis generally performed after comparing multiple samples with one or more tests. An MCT will help identify a significantly different sample among multiple samples or a factor among multiple factors.
It would be easy to compare two samples using a significance alpha level of 0.05. In other words, there is only one sample pair to be compared. However, it would be difficult to identify a significantly different sample if the number...
Introduction: MRI and CT scans are crucial advancements in medical imaging techniques, playing a vital role in diagnosing conditions related to the gastrointestinal (GI) system. Each scan serves distinct purposes, targets specific areas, and requires unique nursing duties.
Description of the Procedures
Computed Tomography (CT) scan:
Computed Tomography (CT) scans use X-ray technology to generate detailed images of bones, organs, and tissues. During the scan, the patient lies on a moving table...
DefinitionComputed Tomography (CT) of the genitourinary (GU) tract is a non-invasive imaging modality that utilizes X-rays and computer processing to generate detailed cross-sectional images of the urinary system, encompassing the kidneys, ureters, bladder, and adjacent structures such as the adrenal glands.PurposeCT scans of the GU tract serve several diagnostic and therapeutic purposes, including:Diagnosis of Urinary Tract Diseases: Detects kidney stones, tumors, cysts, and congenital...
Positron Emission Tomography (PET) is a medical imaging technique that provides crucial insights into the body's physiological functions at a molecular level. It is an indispensable resource for diagnosing, staging, and monitoring various illnesses, notably cancer, neurological disorders, and cardiovascular conditions.
Fundamental Principles of PET
Radioactive Tracer: PET involves using biologically active molecules labeled with radioactive isotopes, known as tracers or radiotracers. The...
Language serves as a bridge between ideas and communication, influencing how individuals perceive and interact with the world. Psychologists have long debated whether language shapes thought or vice versa. This discussion gained grip with Edward Sapir and Benjamin Lee Whorf in the 1940s, who proposed that language determines thought, a concept known as linguistic determinism. They suggested that the vocabulary and structure of a language influence how its speakers think and perceive reality.

