Predicting pathological response to neoadjuvant chemoradiotherapy in locally advanced rectal cancer with two step feature selection and ensemble learning
- Changshun Qian 1,2, Shuxin Yang 1, Yijing Chen 2,3, Ran Ge 1, Fangmin Shi 2,3, Chengnan Liu 2,3,4, Hui Wang 5, You Guo 6,7
- Changshun Qian 1,2, Shuxin Yang 1, Yijing Chen 2,3
- 1School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China.
- 2Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, 341000, China.
- 3School of Public Health and Health Management, Gannan Medical University, Ganzhou, 341000, China.
- 4State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
- 5State Key Laboratory of Oncogenes and Related Genes, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China. huiwang@shsmu.edu.cn.
- 6School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China. gy@gmu.edu.cn.
- 7Medical Big Data and Bioinformatics Research Centre, First Affiliated Hospital of Gannan Medical University, Ganzhou, 341000, China. gy@gmu.edu.cn.
- 0School of Information Engineering, Jiangxi University of Science and Technology, Ganzhou, 341000, China.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.This study identifies a 32-gene pair signature (32-GPS) as a robust biomarker for predicting neoadjuvant chemoradiotherapy (nCRT) response in locally advanced rectal cancer (LARC) patients. The developed BoostForest model demonstrates high accuracy, improving treatment response prediction for LARC.
Area Of Science
- Oncology
- Genomics
- Bioinformatics
Background
- Locally advanced rectal cancer (LARC) patients exhibit significant variability in response to neoadjuvant chemoradiotherapy (nCRT).
- Predicting nCRT response in LARC is challenging due to individual patient differences and response imbalances.
- Accurate prediction of treatment response is crucial for optimizing LARC management.
Purpose Of The Study
- To identify predictive biomarkers for nCRT response in LARC patients.
- To develop an ensemble learning model for predicting nCRT response in LARC.
- To enhance personalized treatment strategies for LARC through improved response prediction.
Main Methods
- A two-step feature selection approach using relative expression orderings (REOs) to identify stable gene pairs.
- Preliminary screening with MDFS, Boruta, MCFS, and VSOLassoBag, followed by Incremental Feature Selection (IFS) with Extreme Gradient Boosting (XGBoost).
- Development of the BoostForest ensemble model for prediction and SHAP for interpretability.
Main Results
- A 32-gene pair signature (32-GPS) was identified as a robust predictive biomarker.
- The BoostForest model achieved high performance: AUPRC of 0.983 and accuracy of 0.988 in the test set; AUPRC of 0.785 and accuracy of 0.898 in the validation cohort.
- BoostForest outperformed Random Forest, Support Vector Machine (SVM), and XGBoost, with 32-GPS showing superior predictive capability over alternative gene sets.
Conclusions
- The two-step feature selection method effectively identified robust predictive biomarkers for nCRT response in LARC.
- The BoostForest model, utilizing the 32-GPS, demonstrates superior performance in predicting treatment response.
- This approach holds promise for improving personalized treatment selection and outcomes for LARC patients.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.

