Machine learning-driven gene expression profiling for lung cancer stage determination

  • 0Department of Biostatistics, University of Michigan-Ann Arbor, Ann Arbor, MI, USA.

|

|

Summary

This summary is machine-generated.

This study used machine learning and RNA sequencing data to accurately classify early versus late-stage lung cancer. The developed XGBoost model identified key genes, improving cancer staging for better treatment guidance.

Area Of Science

  • Oncology
  • Bioinformatics
  • Computational Biology

Background

  • Lung cancer is a major cause of cancer mortality, necessitating accurate staging for effective treatment.
  • Next-generation sequencing (NGS) and machine learning (ML) offer advanced methods for precise cancer classification.
  • Traditional imaging methods have limitations in detailed lung cancer staging.

Purpose Of The Study

  • To classify early versus late-stage lung cancer using RNA-Seq data.
  • To apply the XGBoost machine learning algorithm with cross-validation (CV) for improved staging accuracy.
  • To identify key predictive genes from RNA sequencing data.

Main Methods

  • Utilized RNA-Seq data from 993 patients in The Cancer Genome Atlas (TCGA) cohort.
  • Performed gene selection using the Wilcoxon rank-sum test on training data.
  • Optimized the XGBoost model through cross-validation and assessed performance using Area Under the Curve (AUC).

Main Results

  • The XGBoost model achieved a test AUC of 0.6534.
  • Identified 40 key genes crucial for predictive accuracy and minimizing overfitting.
  • Determined optimal classification thresholds of 0.3 and 0.4 for balancing sensitivity and specificity.

Conclusions

  • Integrating RNA-Seq data with machine learning enhances lung cancer staging accuracy.
  • The study highlights the potential of ML-driven genomic analysis in oncology.
  • Future work should involve larger datasets, model benchmarking, and multi-omics integration for clinical translation.