Capability and accuracy of usual statistical analyses in a real-world setting using a federated approach
View abstract on PubMed
Summary
This summary is machine-generated.Federated analysis using DataSHIELD successfully reproduced most results from centralized analysis in a real-world oncology cohort. This approach maintains data privacy while enabling diverse statistical analyses.
Area Of Science
- Health Informatics
- Biostatistics
- Data Science
Background
- Federated analysis offers a privacy-preserving alternative to centralized data analysis.
- Real-world validation is crucial for assessing the utility of federated learning in healthcare.
Purpose Of The Study
- To evaluate the capability of DataSHIELD's federated analysis approach in a real-world oncology setting.
- To compare the accuracy and feasibility of federated analysis against traditional centralized methods.
Main Methods
- Anonymized synthetic longitudinal oncology data was split into three local databases.
- DataSHIELD was used to perform descriptive statistics, survival analysis, regression, and correlation analyses in a federated manner.
- Results from the federated approach were compared with those from a centralized analysis.
Main Results
- DataSHIELD successfully reproduced most analyses, demonstrating its capability, with minor limitations due to data disclosure rules.
- Descriptive statistics yielded equivalent results, while regression models showed similar estimates with slight accuracy loss in multivariate analyses.
- The federated approach proved effective for various analyses, preserving individual data privacy.
Conclusions
- DataSHIELD provides a practical and effective federated analysis solution for real-world healthcare data.
- Balancing data privacy and analytical accuracy requires pre-defined privacy requirements and data quality assessments.
- Federated analysis using DataSHIELD is a viable method for collaborative research while safeguarding sensitive patient information.
Related Concept Videos
When we take repeated measurements on the same or replicated samples, we will observe inconsistencies in the magnitude. These inconsistencies are called errors. To categorize and characterize these results and their errors, the researcher can use statistical analysis to determine the quality of the measurements and/or suitability of the methods.
One of the most commonly used statistical quantifiers is the mean, which is the ratio between the sum of the numerical values of all results and the...
Epidemiological data primarily involves information on specific populations' occurrence, distribution, and determinants of health and diseases. This data is crucial for understanding disease patterns and impacts, aiding public health decision-making and disease prevention strategies. The analysis of epidemiological data employs various statistical methods to interpret health-related data effectively. Here are some commonly used methods:
Descriptive Statistics: These provide basic...
In the ever-evolving field of public health, statistical analysis serves as a cornerstone for understanding and managing disease outbreaks. By leveraging various statistical tools, health professionals can predict potential outbreaks, analyze ongoing situations, and devise effective responses to mitigate impact. For that to happen, there are a few possible stages of the analysis:
Predicting Outbreaks
Predictive analytics, a branch of statistics, uses historical data, algorithmic models, and...
SAS, short for Statistical Analysis System, is a powerful data analysis, management, and visualization tool. Developed by the SAS Institute in the early 1970s, SAS has evolved into a comprehensive software suite used across various industries for statistical analysis, business intelligence, and predictive modeling.
Applications: SAS finds applications in numerous fields, including healthcare for clinical trial analysis, finance for risk assessment, marketing for customer data analysis, and...
Statgraphics is a comprehensive statistical software suite designed for both basic and advanced data analysis. Originating in 1980 at Princeton University under Dr. Neil W. Polhemus, it was one of the pioneering tools for statistical computing on personal computers, with its public release in 1982 marking an early milestone in data science software. Over the years, it has evolved into a robust platform for data science, offering tools for regression analysis, ANOVA, multivariate statistics,...
R is a powerful software environment for statistical computing and graphics. Originating as an implementation of the S language, developed at Bell Laboratories, R has evolved into a robust, open-source statistical software favored by statisticians and data scientists worldwide. Its comprehensive suite includes data manipulation, calculation, and graphical display capabilities, making it versatile for data analysis and visualization. Its programming language is at the core of R's...

