Application of Transcriptome-Based Gene Set Featurization for Machine Learning Model to Predict the Origin of Metastatic Cancer
View abstract on PubMed
Summary
This summary is machine-generated.A new AI tool, ONCOfind-AI, accurately predicts the origin of metastatic cancers using gene expression data. This approach integrates different data types, improving cancer diagnosis and treatment strategies for unknown primary cancers.
Area Of Science
- Computational biology
- Genomics
- Machine learning in oncology
Background
- Identifying the primary tumor site in cancer of unknown primary (CUP) is critical for effective treatment and patient outcomes.
- Current diagnostic methods struggle to pinpoint the origin of metastatic cancers, leading to challenges in patient management.
- CUP accounts for a significant number of cancer-related deaths, highlighting the need for improved diagnostic tools.
Purpose Of The Study
- To introduce ONCOfind-AI, a machine learning framework designed to predict the primary site of metastatic cancers.
- To leverage transcriptome-based gene set features for enhanced accuracy in cancer origin prediction.
- To demonstrate the framework's ability to integrate RNA sequencing and microarray data for improved model performance.
Main Methods
- Development of a machine learning framework (ONCOfind-AI) utilizing gene set scores derived from transcriptome data.
- Characterization of transcriptome profiles from different data platforms (RNA sequencing and microarrays) using gene set scores.
- Integration of data from diverse platforms to train and enhance machine learning models for cancer origin prediction.
Main Results
- Integration of data from different platforms significantly improved the accuracy of machine learning models for predicting cancer origins.
- External validation using clinical samples achieved a top-1 accuracy of 0.80-0.86 and a top-2 accuracy of 0.90.
- The use of curated gene sets facilitated the merging of gene expression data from disparate platforms.
Conclusions
- ONCOfind-AI demonstrates a promising approach for improving the accuracy of metastatic cancer origin prediction.
- Incorporating biological knowledge via gene sets enhances data compatibility across different platforms, crucial for robust machine learning models.
- This framework has the potential to aid in the diagnosis of CUP, guiding more effective treatment decisions and potentially improving patient outcomes.

