Statistical inference with large-scale trait imputation | JoVE Visualize

Area of Science:

Genetics
Statistical Genetics
Bioinformatics

Background:

Large-scale trait imputation using Genome-Wide Association Study (GWAS) summary data and genotyped individuals is crucial for downstream genetic analyses.
The existing LS-imputation method assumes trait values are independent, which is a simplification that can impact accuracy.
Calculating the full covariance matrix for imputed trait values is computationally challenging due to large datasets.

Purpose of the Study:

To develop a method that accounts for the covariance matrix of imputed trait values in large-scale genetic analyses.
To relax the assumption of independence among imputed trait values, thereby improving the accuracy of downstream analyses.
To enhance the utility of GWAS summary data for individual-level genetic studies.

Main Methods:

Proposed a "divide and conquer/combine" strategy to estimate and incorporate the covariance matrix of imputed trait values.
Implemented batch processing to manage the computational complexity of covariance matrix estimation.
Applied the revised imputation method to UK Biobank data for marginal association analysis.

Main Results:

The new method showed some improvements in marginal association analysis compared to the original LS-imputation method in specific cases.
The original LS-imputation method demonstrated robust performance, attributed to near-constant variances and weak correlations among imputed values in the tested dataset.
The findings suggest that while the independence assumption is technically incorrect, its impact may be limited in datasets with specific covariance structures.

Conclusions:

The proposed "divide and conquer/combine" strategy offers a way to account for the covariance of imputed trait values, addressing a limitation of previous methods.
The practical benefits of the new method may vary depending on the characteristics of the trait and the dataset.
Further research is warranted to explore the performance of the improved imputation method across diverse genetic datasets and traits.