Comparison of AI-integrated pathways with human-AI interaction in population mammographic screening for breast cancer

Affiliations
  • 1St Vincent’s BreastScreen, St Vincent’s Hospital Melbourne, Melbourne, VIC, Australia. helen.frazer@svha.org.au.
  • 2BreastScreen Victoria, Caulfield, VIC, Australia. helen.frazer@svha.org.au.
  • 3Faculty of Medicine, Dentistry & Health Sciences, University of Melbourne, Melbourne, VIC, Australia. helen.frazer@svha.org.au.
  • 4Bioinformatics and Cellular Genomics Unit, St Vincent’s Institute of Medical Research, Fitzroy, VIC, Australia.
  • 5Melbourne Integrative Genomics, School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia.
  • 6School of Computer Science, Australian Institute for Machine Learning, University of Adelaide, Adelaide, SA, Australia.
  • 7St Vincent’s BreastScreen, St Vincent’s Hospital Melbourne, Melbourne, VIC, Australia.
  • 8Department of Surgery, St Vincent’s Hospital Melbourne, Melbourne, VIC, Australia.
  • 9Department of Surgery, University of Melbourne, Melbourne, VIC, Australia.
  • 10Centre for Epidemiology & Biostatistics, Melbourne School of Population and Global Health, University of Melbourne, Melbourne, VIC, Australia.
  • 11Department of Radiology, St Vincent’s Hospital Melbourne, Melbourne, VIC, Australia.
  • 12Centre for Vision, Speech and Signal Processing (CVSSP), The University of Surrey, Surrey, UK.

|

Abstract

Artificial intelligence (AI) readers of mammograms compare favourably to individual radiologists in detecting breast cancer. However, AI readers cannot perform at the level of multi-reader systems used by screening programs in countries such as Australia, Sweden, and the UK. Therefore, implementation demands human-AI collaboration. Here, we use a large, high-quality retrospective mammography dataset from Victoria, Australia to conduct detailed simulations of five potential AI-integrated screening pathways, and examine human-AI interaction effects to explore automation bias. Operating an AI reader as a second reader or as a high confidence filter improves current screening outcomes by 1.9-2.5% in sensitivity and up to 0.6% in specificity, achieving 4.6-10.9% reduction in assessments and 48-80.7% reduction in human reads. Automation bias degrades performance in multi-reader settings but improves it for single-readers. This study provides insight into feasible approaches for AI-integrated screening pathways and prospective studies necessary prior to clinical adoption.