Strengths and limitations of non-disclosive data analysis: a comparison of breast cancer survival classifiers using VisualSHIELD
View abstract on PubMed
Summary
This summary is machine-generated.Data privacy is crucial for patient data research. This study evaluated breast cancer classifiers using the federated DataSHIELD system, finding logistic regression performed best, though a 4% performance dip occurred with privacy constraints.
Area Of Science
- Bioinformatics
- Computational Biology
- Biostatistics
Background
- Patient data privacy is paramount in research.
- DataSHIELD offers privacy-aware statistical analysis in a federated setting.
- Challenges include infrastructure complexity, performance, and usability.
Purpose Of The Study
- To review breast cancer classifiers within a federated, privacy-preserving environment.
- To assess the performance and limitations of non-disclosive tools in a realistic setting.
- To compare federated classifier performance against unconstrained analysis.
Main Methods
- Five independent breast cancer survival gene expression datasets were pooled via a federated infrastructure (DataSHIELD).
- Three published and two new 5-year cancer-free survival risk score classifiers were trained.
- A reference classifier was trained with unconstrained data access for comparison.
Main Results
- Published classifiers showed poor generalization across different patient cohorts.
- Logistic regression and random forest demonstrated the best average performance among tested methods.
- The unconstrained logistic regression classifier outperformed its federated counterpart by approximately 4%.
Conclusions
- Federated analysis using DataSHIELD is feasible for breast cancer survival prediction.
- Logistic regression offers a robust method within this framework, despite a minor performance trade-off.
- VisualSHIELD enhances DataSHIELD's usability and reproducibility for non-technical users.
Related Concept Videos
Cancer survival analysis focuses on quantifying and interpreting the time from a key starting point, such as diagnosis or the initiation of treatment, to a specific endpoint, such as remission or death. This analysis provides critical insights into treatment effectiveness and factors that influence patient outcomes, helping to shape clinical decisions and guide prognostic evaluations. A cornerstone of oncology research, survival analysis tackles the challenges of skewed, non-normally...
The Kaplan-Meier estimator is a non-parametric method used to estimate the survival function from time-to-event data. In medical research, it is frequently employed to measure the proportion of patients surviving for a certain period after treatment. This estimator is fundamental in analyzing time-to-event data, making it indispensable in clinical trials, epidemiological studies, and reliability engineering. By estimating survival probabilities, researchers can evaluate treatment effectiveness,...
Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...
Survival models analyze the time until one or more events occur, such as death in biological organisms or failure in mechanical systems. These models are widely used across fields like medicine, biology, engineering, and public health to study time-to-event phenomena. To ensure accurate results, survival analysis relies on key assumptions and careful study design.
Survival Times Are Positively Skewed
Survival times often exhibit positive skewness, unlike the normal distribution assumed...
The Mantel-Cox log-rank test is a widely used statistical method for comparing the survival distributions of two groups. It tests whether a statistically significant difference exists in survival times between the groups without assuming a specific distribution for the survival data, making it a non-parametric test. This flexibility makes the log-rank test particularly valuable in medical research and other fields where the timing of an event, such as death or disease recurrence, is of...
Survival analysis is a statistical method used to study time-to-event data, where the "event" might represent outcomes like death, disease relapse, system failure, or recovery. A unique feature of survival data is censoring, which occurs when the event of interest has not been observed for some individuals during the study period. This requires specialized techniques to handle incomplete data effectively.
The primary goal of survival analysis is to estimate survival time—the time...

