The spike-and-slab quantile LASSO for robust variable selection in cancer genomics studies
View abstract on PubMed
Summary
This summary is machine-generated.This study introduces a robust spike-and-slab quantile LASSO method to address data irregularities in cancer genomics. The new approach improves gene selection and predictive modeling for complex traits, outperforming existing methods in simulations and real cancer data analysis.
Area Of Science
- Genomics
- Biostatistics
- Computational Biology
Background
- Cancer genomics studies often exhibit data irregularities like outliers and heavy-tailed distributions.
- Robust variable selection methods are crucial for identifying genes linked to heterogeneous disease traits and building accurate predictive models.
Purpose Of The Study
- To develop a robust variable selection method that combines the strengths of quantile LASSO and Bayesian regularized quantile regression for high-dimensional genomics data.
- To overcome limitations of existing methods by proposing a fully Bayesian spike-and-slab formulation using the asymmetric Laplace distribution (ALD).
Main Methods
- The proposed method, spike-and-slab quantile LASSO, utilizes a robust likelihood based on the asymmetric Laplace distribution (ALD).
- It incorporates selective shrinkage and self-adaptivity properties from spike-and-slab LASSO.
- Computational efficiency is achieved through Expectation-Maximization (EM) steps within a coordinate descent framework.
Main Results
- Comprehensive simulations demonstrate the superiority of the spike-and-slab quantile LASSO over competing methods, especially with heavy-tailed errors in various settings.
- The method effectively handles both homogeneous and heterogeneous model conditions.
Conclusions
- The spike-and-slab quantile LASSO offers a robust and computationally advantageous approach for variable selection in cancer genomics.
- Its effectiveness is validated through simulations and applied successfully to lung adenocarcinoma (LUAD) and skin cutaneous melanoma (SKCM) datasets from The Cancer Genome Atlas (TCGA).
Related Concept Videos
Cancer survival analysis focuses on quantifying and interpreting the time from a key starting point, such as diagnosis or the initiation of treatment, to a specific endpoint, such as remission or death. This analysis provides critical insights into treatment effectiveness and factors that influence patient outcomes, helping to shape clinical decisions and guide prognostic evaluations. A cornerstone of oncology research, survival analysis tackles the challenges of skewed, non-normally...
Sometimes, a data set can have a recorded numerical observation that greatly deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier. To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This...
Truncation in survival analysis refers to the exclusion of individuals or events from the dataset based on specific criteria related to the time of the event. This exclusion can happen in two primary forms: left truncation and right truncation.
Left truncation occurs when individuals who experienced the event of interest before a certain time are not included in the study. This is often due to a "delayed entry" into the study where only those who survive until a certain entry point are...
Parametric survival analysis models survival data by assuming a specific probability distribution for the time until an event occurs. The Weibull and exponential distributions are two of the most commonly used methods in this context, due to their versatility and relatively straightforward application.
Weibull Distribution
The Weibull distribution is a flexible model used in parametric survival analysis. It can handle both increasing and decreasing hazard rates, depending on its shape parameter...
Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%...
Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

