Predicting pathological response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer with two step feature selection and ensemble learning

  • 0School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China.

|

|

Summary

This summary is machine-generated.

This study identifies a 32-gene pair signature (32-GPS) as a robust biomarker for predicting neoadjuvant chemoradiotherapy (nCRT) response in locally advanced rectal cancer (LARC) patients. The developed BoostForest model demonstrates high accuracy, improving treatment response prediction for LARC.

Area Of Science

  • Oncology
  • Genomics
  • Bioinformatics

Background

  • Locally advanced rectal cancer (LARC) patients exhibit significant variability in response to neoadjuvant chemoradiotherapy (nCRT).
  • Predicting nCRT response in LARC is challenging due to individual patient differences and response imbalances.
  • Accurate prediction of treatment response is crucial for optimizing LARC management.

Purpose Of The Study

  • To identify predictive biomarkers for nCRT response in LARC patients.
  • To develop an ensemble learning model for predicting nCRT response in LARC.
  • To enhance personalized treatment strategies for LARC through improved response prediction.

Main Methods

  • A two-step feature selection approach using relative expression orderings (REOs) to identify stable gene pairs.
  • Preliminary screening with MDFS, Boruta, MCFS, and VSOLassoBag, followed by Incremental Feature Selection (IFS) with Extreme Gradient Boosting (XGBoost).
  • Development of the BoostForest ensemble model for prediction and SHAP for interpretability.

Main Results

  • A 32-gene pair signature (32-GPS) was identified as a robust predictive biomarker.
  • The BoostForest model achieved high performance: AUPRC of 0.983 and accuracy of 0.988 in the test set; AUPRC of 0.785 and accuracy of 0.898 in the validation cohort.
  • BoostForest outperformed Random Forest, Support Vector Machine (SVM), and XGBoost, with 32-GPS showing superior predictive capability over alternative gene sets.

Conclusions

  • The two-step feature selection method effectively identified robust predictive biomarkers for nCRT response in LARC.
  • The BoostForest model, utilizing the 32-GPS, demonstrates superior performance in predicting treatment response.
  • This approach holds promise for improving personalized treatment selection and outcomes for LARC patients.