Pan-cancer predictive survival model development and evaluation using electronic health record and genetic data across 10 cancer types

  • 0Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, UK. jurgita.gammall.20@ucl.ac.uk.

|

|

Summary

This summary is machine-generated.

This study developed machine learning models for cancer survival prediction using patient data and genetic information. Models showed good performance, with genetic data improving predictions for some cancers, aiding treatment decisions.

Area Of Science

  • Oncology
  • Bioinformatics
  • Machine Learning

Background

  • Rising cancer incidence necessitates advanced analytical approaches.
  • Large-scale healthcare data availability offers opportunities for improved cancer analysis.
  • Accurate cancer prognosis is crucial for effective patient management and treatment.

Purpose Of The Study

  • To develop and evaluate prognostic cancer survival models for ten common cancer types.
  • To compare the performance of various machine learning algorithms in cancer prognosis.
  • To assess the added value of genetic information and improve model explainability for clinical adoption.

Main Methods

  • Utilized data from 9977 cancer patients across ten cancer types.
  • Integrated genetic data (100,000 Genomes Project) with clinical, demographic, and hospital data.
  • Developed and compared four machine learning algorithms: Elastic Net Cox, random survival forest, gradient boosting survival, and DeepSurv.

Main Results

  • Models achieved C-indices ranging from 60% (bladder cancer) to 80% (glioma), averaging 72%.
  • Machine learning algorithms performed similarly, with DeepSurv slightly underperforming.
  • Genetic data enhanced model performance for endometrial, glioma, ovarian, and prostate cancers.

Conclusions

  • Developed robust machine learning models for cancer survival prediction.
  • Demonstrated the utility of integrating genetic data into prognostic models.
  • Identified key prognostic features including age, stage, TP53 mutations, and tumour mutational burden.