Machine learning-driven gene expression profiling for lung cancer stage determination
- Yinbo Wang 1, Kai Fu 2
- Yinbo Wang 1, Kai Fu 2
- 1Department of Biostatistics, University of Michigan-Ann Arbor, Ann Arbor, MI, USA.
- 2Department of Molecular, Cellular and Developmental Bology, University of California Los Angeles, Los Angeles, CA, USA.
- 0Department of Biostatistics, University of Michigan-Ann Arbor, Ann Arbor, MI, USA.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.
View abstract on PubMed
Summary
This summary is machine-generated.This study used machine learning and RNA sequencing data to accurately classify early versus late-stage lung cancer. The developed XGBoost model identified key genes, improving cancer staging for better treatment guidance.
Area Of Science
- Oncology
- Bioinformatics
- Computational Biology
Background
- Lung cancer is a major cause of cancer mortality, necessitating accurate staging for effective treatment.
- Next-generation sequencing (NGS) and machine learning (ML) offer advanced methods for precise cancer classification.
- Traditional imaging methods have limitations in detailed lung cancer staging.
Purpose Of The Study
- To classify early versus late-stage lung cancer using RNA-Seq data.
- To apply the XGBoost machine learning algorithm with cross-validation (CV) for improved staging accuracy.
- To identify key predictive genes from RNA sequencing data.
Main Methods
- Utilized RNA-Seq data from 993 patients in The Cancer Genome Atlas (TCGA) cohort.
- Performed gene selection using the Wilcoxon rank-sum test on training data.
- Optimized the XGBoost model through cross-validation and assessed performance using Area Under the Curve (AUC).
Main Results
- The XGBoost model achieved a test AUC of 0.6534.
- Identified 40 key genes crucial for predictive accuracy and minimizing overfitting.
- Determined optimal classification thresholds of 0.3 and 0.4 for balancing sensitivity and specificity.
Conclusions
- Integrating RNA-Seq data with machine learning enhances lung cancer staging accuracy.
- The study highlights the potential of ML-driven genomic analysis in oncology.
- Future work should involve larger datasets, model benchmarking, and multi-omics integration for clinical translation.
Related Experiment Videos
Contact us if these videos are not relevant.
Contact us if these videos are not relevant.

