Cross-site imputation can recover missing variables in federated multicenter studies

  • 0Department of Global Public Health, Karolinska Institutet, Stockholm, Sweden; Department of Neurobiology, Social Care Sciences and Society, Karolinska Institutet, Stockholm, Sweden. Electronic address: robert.thiesmeier@ki.se.
Journal of clinical epidemiology +

|

Abstract

OBJECTIVES

In multisite studies, it is common for some sites not to have recorded key variables. Although it is theoretically possible to use data from sites with recorded observations to impute the missing values, this process becomes challenging when data pooling is not feasible due to logistic or legal constraints. We, therefore, propose a multiple imputation approach-cross-site imputation-to recover any variables across sites without the need to pool individual-level data.

METHODS

Cross-site imputation involves transporting predicted regression coefficients and variances from studies with observed data to impute missing variables at sites without data. The approach is illustrated in an applied example of recovering systematically missing confounders across Swedish hospitals, and theoretical considerations are outlined.

RESULTS

Cross-site imputation successfully recovered systematically missing confounding variables independently at study sites where data were not recorded. The approach allowed us to include all hospitals in the fully adjusted analysis.

CONCLUSION

Given the increasing importance of multisite studies in observational research, cross-site imputation could offer a practical approach for imputing variables that have not been recorded in some study sites.

Related Concept Videos

Confounding in Epidemiological Studies 01:27

109

Confounding in statistical epidemiology represents a pivotal challenge, referring to the distortion in the perceived relationship between an exposure and an outcome due to the presence of a third variable, known as a confounder. This variable is associated with both the exposure and the outcome but is not a direct link in their causal chain. Its presence can lead to erroneous interpretations of the exposure's effect, either exaggerating or underestimating the true association. This...

Genome-wide Association Studies-GWAS 01:11

12.1K

Genome-wide association studies or GWAS are used to identify whether common SNPs are associated with certain diseases. Suppose specific SNPs are more frequently observed in individuals with a particular disease than those without the disease. In that case, those SNPs are said to be associated with the disease. Chi-square analysis is performed to check the probability of the allele likely to be associated with the disease.
GWAS does not require the identification of the target gene involved in...

Multiple Regression 01:25

2.9K

Multiple regression assesses a linear relationship between one response or dependent variable and two or more independent variables. It has many practical applications.
Farmers can use multiple regression to determine the crop yield based on more than one factor, such as water availability, fertilizer, soil properties, etc. Here, the crop yield is the response or dependent variable as it depends on the other independent variables. The analysis requires the construction of a scatter plot...

Improving Translational Accuracy 02:07

8.5K

Base complementarity between the three base pairs of mRNA codon and the tRNA anticodon is not a failsafe mechanism. Inaccuracies can range from a single mismatch to no correct base pairing at all. The free energy difference between the correct and nearly correct base pairs can be as small as 3 kcal/ mol. With complementarity being the only proofreading step, the estimated error frequency would be one wrong amino acid in every 100 amino acids incorporated. However, error frequencies observed in...

Longitudinal Studies 01:26

90

Longitudinal studies are also widely used in other medical and social science fields. For instance, in cardiovascular research, they can monitor patients' health over decades to identify risk factors for heart disease, such as high cholesterol or smoking, and evaluate the long-term effectiveness of preventive measures. Similarly, in mental health studies, researchers might follow individuals from adolescence into adulthood to understand the development and progression of conditions like...

Comparing the Survival Analysis of Two or More Groups 01:20

88

Survival analysis is a cornerstone of medical research, used to evaluate the time until an event of interest occurs, such as death, disease recurrence, or recovery. Unlike standard statistical methods, survival analysis is particularly adept at handling censored data—instances where the event has not occurred for some participants by the end of the study or remains unobserved. To address these unique challenges, specialized techniques like the Kaplan-Meier estimator, log-rank test, and...