Prescription data and demographics: An explainable machine learning exploration of colorectal cancer risk factors based on data from Danish national registries

  • 0SDU Health Informatics and Technology, The Maersk Mc-Kinney Moller Institute, University of Southern Denmark, 5230 Odense, Denmark.

|

|

Summary

This summary is machine-generated.

Machine learning models can predict colorectal cancer risk using patient demographics and medication data. While precise, the models need further refinement to improve comprehensive risk identification for clinical use.

Area Of Science

  • Computational biology
  • Oncology
  • Health informatics

Background

  • Colorectal cancer remains a significant global health challenge despite advances in treatment and prevention.
  • Predictive models are crucial for early detection and personalized risk management.

Purpose Of The Study

  • To evaluate machine learning models for predicting colorectal cancer risk.
  • To utilize demographic and prescribed drug data for risk prediction.
  • To enhance model interpretability using explainable AI techniques.

Main Methods

  • Developed and assessed five machine learning algorithms: Logistic Regression, XGBoost, Random Forests, kNN, and Voting Classifier.
  • Evaluated predictive performance across multiple time horizons (3, 6, 12, 36 months).
  • Employed explainable AI for feature contribution analysis (age, sex, social status, medications).

Main Results

  • The Voting Classifier demonstrated high precision (>0.99) in identifying at-risk patients.
  • Recall was moderate (~0.6), indicating room for improvement in comprehensive detection.
  • Model performance was consistent across different prediction timeframes.

Conclusions

  • Machine learning effectively identifies individuals at elevated risk for colorectal cancer.
  • Early intervention and personalized strategies are facilitated by these predictive models.
  • Further research is necessary before widespread clinical implementation.