A transfer learning-enhanced deep learning framework for efficient and interpretable soil heavy metal pollution prediction under data scarcity and spatial heterogeneity

  • 1College of Electrical and Information Engineering and Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha 410082, PR China; Key Laboratory of Jiangxi Province for Persistent Pollutants Prevention Control and Resource Reuse, Nanchang Hangkong University, Nanchang 330063, PR China. Electronic address: binyang@hnu.edu.cn.
  • 2College of Electrical and Information Engineering and Key Laboratory of Visual Perception and Artificial Intelligence of Hunan Province, Hunan University, Changsha 410082, PR China. Electronic address: anqihe@hnu.edu.cn.
  • 3Key Laboratory of Jiangxi Province for Persistent Pollutants Prevention Control and Resource Reuse, Nanchang Hangkong University, Nanchang 330063, PR China. Electronic address: renzhong424@163.com.
  • 4Key Laboratory of Jiangxi Province for Persistent Pollutants Prevention Control and Resource Reuse, Nanchang Hangkong University, Nanchang 330063, PR China. Electronic address: recarudo@yeah.net.
  • 5Jiangxi Academy of Eco-environmental Sciences and Planning, Nanchang 330000, PR China. Electronic address: zhaogang6766@126.com.
  • 6Jiangxi Academy of Eco-environmental Sciences and Planning, Nanchang 330000, PR China. Electronic address: fanych@sthjt.jiangxi.gov.cn.
  • 7National-Regional Joint Engineering Research Center for Soil Pollution Control and Remediation in South China, Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Institute of Eco-environmental and Soil Sciences, Guangdong Academy of Science, Guangzhou 510650, PR China. Electronic address: Wangqi@soil.gd.cn.
  • 8Key Laboratory of Jiangxi Province for Persistent Pollutants Prevention Control and Resource Reuse, Nanchang Hangkong University, Nanchang 330063, PR China. Electronic address: sllou@hnu.edu.cn.

Abstract

Large-scale soil heavy metal pollution risk estimation remains challenging due to data scarcity and spatial heterogeneity. Although traditional machine learning (ML) methods offer notable predictive capabilities, they often struggle with high-dimensional, heterogeneous data, limited labeled samples, and insufficient interpretability. In this study, we propose a transfer learning (TL)-based deep learning (DL) framework that integrates convolutional neural networks (CNN), termed TL-CNN, with remote sensing-based (RSs), web-based (WBs), and field-sampled datasets (including spatial regionalization features, SRs) to efficiently predict soil heavy metal pollution. By coupling hierarchical feature extraction with a GradSHAP interpretability module, the approach provides both predictive accuracy and explanatory insights. Results from Shaoguan City (2018-2022) demonstrate that the TL-CNN model substantially outperforms conventional ML methods, with overall accuracy exceeding 84 %, particularly under multi-metal pollution scenarios. Leveraging TL, the model adaptively addresses data scarcity, reducing the need for costly field sampling and mitigating interpolation errors. The incorporation of RSs- and WBs-derived features captures critical environmental variability and anthropogenic emissions, while SRs refine local pollution patterns. GradSHAP analyses highlight the pivotal role of RSs features and spatial metrics in large-scale predictions. Overall, the proposed TL-CNN model underscores the potential of multi-source heterogeneous datasets and TL-based DL strategies to promote sustainable soil management.