Mahdi Mahdavi1,2, Sarah White3, Sandeep S Hothi4,5
1Department of Management and Enterprise, Faculty of Business and Law, University of Roehampton, London, United Kingdom.
You might also read
Articles linked to this work by shared authors, journal, and citation graph.
This study investigated how well AI-based stress echocardiography tools match the diagnoses of cardiologists. Researchers found that while AI and doctors often agree, they frequently disagree when patients have complex health histories like diabetes or heart disease. Doctors generally view AI as an advisory tool, preferring to rely on their own expertise and additional testing when conflicts occur.
Area of Science:
Background:
The integration of automated diagnostic tools into clinical workflows remains a complex challenge for modern healthcare. Prior research has shown that model performance often fails to account for the nuances of human decision-making. No prior work had resolved how clinicians reconcile conflicting automated outputs with their own professional assessments during cardiac evaluations. That uncertainty drove the need for a detailed examination of diagnostic alignment in real-world settings. Existing literature often overlooks the specific patient factors that influence whether a physician accepts or rejects an algorithmic suggestion. This gap motivated an investigation into the interaction between cardiologists and specific software designed for stress echocardiography. Understanding these dynamics is necessary to ensure that technology supports rather than complicates patient care. The current study addresses how professional judgment interacts with machine-generated insights in the context of coronary artery disease.
Purpose Of The Study:
The researchers propose that cardiologists prioritize their own clinical judgment over AI, seeking corroboration through secondary testing or peer review when disagreements arise. This behavior demonstrates that clinicians treat automated outputs as advisory rather than definitive, requiring high personal confidence to disregard contradictory algorithmic suggestions.
The study utilized EchoGo Pro, an AI-driven stress echocardiography system designed to assist in the identification of coronary artery disease. This tool functions by analyzing imaging data, though it frequently rejects scans due to insufficient quality, particularly in male patients and those with specific family histories.
The researchers identify that hypertension, diabetes, and pre-existing coronary artery disease are necessary factors to consider, as they significantly lower the rate of concordance between human clinicians and the AI system. These comorbidities create challenges that lead to lower agreement compared to healthier patient cohorts.
This study aimed to examine the diagnostic alignment between an AI-driven stress echocardiography system and the professional assessments of cardiologists. The researchers sought to identify specific patient characteristics that predict discordance between human and machine interpretations. A secondary goal involved exploring the decision-making strategies employed by clinicians when faced with conflicting algorithmic recommendations. The investigation addressed the critical knowledge gap regarding how automated tools integrate into real-world cardiac diagnostic workflows. By analyzing both quantitative performance metrics and qualitative survey data, the authors intended to clarify the role of AI in clinical practice. The study also aimed to determine the frequency and predictors of scan rejection by the software. This research was motivated by the need to understand how technology influences diagnostic accuracy for patients with suspected coronary artery disease. Ultimately, the work provides evidence on whether current AI systems effectively support or potentially complicate the diagnostic process for cardiologists.
Main Methods:
The research team employed a mixed methods design to evaluate diagnostic alignment between clinicians and automated software. A quantitative analysis examined data from 854 participants enrolled in the PROTEUS randomized controlled trial. Logistic regression models identified predictors of agreement, disagreement, and scan rejection while adjusting for various cardiovascular risk factors. A qualitative approach involved distributing surveys to 61 UK consultant cardiologists recruited via the Qualtrics platform. This survey explored professional perceptions regarding the risks and benefits of following algorithmic recommendations. The review approach synthesized these findings to understand decision-making strategies during instances of discordance. Statistical adjustments accounted for age, sex, smoking status, and body mass index to ensure robust comparisons. This dual-pronged strategy provided both broad performance metrics and deep insights into the human factors influencing clinical adoption.
Main Results:
The AI system and cardiologists achieved a 60% agreement rate across the 854 analyzed cases. Agreement was significantly lower for patients with hypertension, diabetes, or pre-existing coronary artery disease. The software rejected 26.1% of scans due to insufficient image quality, with higher rejection rates observed in male patients. Combining human and machine diagnoses increased the identification of positive cases from 17.9% to 22.1%. Surveyed cardiologists required between 65% and 69% confidence in their own initial diagnosis to disregard contradictory AI recommendations. Respondents consistently treated algorithmic outputs as advisory rather than definitive, preferring to seek second opinions or additional testing. A paradoxical finding showed that cardiologists with higher trust in AI tools required even greater personal confidence to ignore the system. The primary cause of discordance was identified as the inability of the software to incorporate patient history and clinical context.
Conclusions:
The authors propose that automated systems currently function best as advisory aids rather than definitive diagnostic authorities. Clinicians prioritize their own expertise and seek further verification when algorithmic outputs contradict their initial assessments. This synthesis suggests that the inability of software to incorporate broader patient history remains a primary driver of diagnostic discordance. The researchers note that high rejection rates for image quality may limit the utility of these tools in diverse patient populations. Implications for future development include the necessity of integrating comprehensive clinical data to improve diagnostic accuracy. The findings imply that clinicians use these tools primarily to prompt deeper scrutiny of their own findings. The authors indicate that current models may inadvertently exacerbate existing healthcare inequities if they are not trained on representative datasets. These results highlight the need for systems that better align with the complex, multimorbid nature of real-world clinical practice.
The quantitative component analyzed data from 854 participants in the PROTEUS trial, while the qualitative portion involved survey responses from 61 UK consultant cardiologists. These datasets allow for a comprehensive assessment of both diagnostic performance metrics and the professional perceptions of the clinicians using the technology.
The study measured a 60% agreement rate between the AI and cardiologists. Furthermore, it identified that 26.1% of scans were rejected by the system due to poor image quality, highlighting a significant technical limitation in the current implementation of the diagnostic software.
The authors suggest that future diagnostic systems must integrate wider patient data with imaging to minimize bias. They argue that failing to incorporate clinical context and comorbidities will continue to limit the effectiveness of these tools in complex, real-world patient care environments.