Joint testing of rare variant burden scores using non-negative least squares

Affiliations
  • 1Regeneron Genetics Center, Tarrytown, NY, USA.
  • 2Regeneron Genetics Center, Tarrytown, NY, USA. Electronic address: jonathan.marchini@regeneron.com.

Abstract

Gene-based burden tests are a popular and powerful approach for analysis of exome-wide association studies. These approaches combine sets of variants within a gene into a single burden score that is then tested for association. Typically, a range of burden scores are calculated and tested across a range of annotation classes and frequency bins. Correlation between these tests can complicate the multiple testing correction and hamper interpretation of the results. We introduce a method called the sparse burden association test (SBAT) that tests the joint set of burden scores under the assumption that causal burden scores act in the same effect direction. The method simultaneously assesses the significance of the model fit and selects the set of burden scores that best explain the association at the same time. Using simulated data, we show that the method is well calibrated and highlight scenarios where the test outperforms existing gene-based tests. We apply the method to 73 quantitative traits from the UK Biobank, showing that SBAT is a valuable additional gene-based test when combined with other existing approaches. This test is implemented in the REGENIE software.

Related Concept Videos

JoVE Research Video for Comparing Copy Number Variations and SNPs 02:26

13.4K

Sequencing of the human genome has opened up several best-kept secrets of the genome. Scientists have identified thousands of genome variations that exist within a population. These variations can be a single nucleotide or a larger chromosomal variation.
Copy number variations or CNVs are the structural variations that cover more than 1kb of DNA sequence. The single nucleotide polymorphism (SNP), on the other hand, is a single nucleotide change or a point mutation that is found in more than 1%…

JoVE Research Video for Quantifying and Rejecting Outliers: The Grubbs Test 01:02

1.1K

Sometimes, a data set can have a recorded numerical observation that greatly  deviates from the rest of the data. Assuming that the data is normally distributed, a statistical method called the Grubbs test can be used to determine whether the observation is truly an outlier.  To perform a two-tailed Grubbs test, first, calculate the absolute difference between the outlier and the mean. Then, calculate the ratio between this difference and the standard deviation of the sample. This…

JoVE Research Video for Expected Frequencies in Goodness-of-Fit Tests 01:19

2.4K

A goodness-of-fit test is conducted to determine whether the observed frequency values are statistically similar to the frequencies expected for the dataset. Suppose the expected frequencies for a dataset are equal such as when predicting the frequency of any number appearing when casting a die. In that case, the expected frequency is the ratio of the total number of observations (n)  to the number of categories (k).

Hence, the expected frequency of any number appearing when casting a die…

JoVE Research Video for Significance Testing: Overview 01:04

3.1K

Significance testing is a set of statistical methods used to test whether a claim about a parameter is valid. In analytical chemistry, significance testing is used primarily to determine whether the difference between two values comes from determinate or random errors. The effect of a particular change in the measurement protocol, analyst, or sample itself can cause a deviation from the expected result. In the case of a suspected deviation/outlier, we need to be able to confirm mathematically…

JoVE Research Video for Goodness-of-Fit Test 01:16

3.1K

The goodness-of-fit test is a type of hypothesis test which determines whether the data "fits" a particular distribution. For example, one may suspect that some anonymous data may fit a binomial distribution. A chi-square test (meaning the distribution for the hypothesis test is chi-square) can be used to determine if there is a fit. The null and alternative hypotheses may be written in sentences or stated as equations or inequalities. The test statistic for a goodness-of-fit test is…

JoVE Research Video for Multiple Allele Traits 01:49

33.0K

The Concept of Multiple Allelism

Multiple allelism describes genes that exist in three or more allelic forms. Although diploid organisms, like humans, normally possess only two alleles of each gene, there are multiple alleles of many (if not most) human genes present in a population. Blood type is one example of multiple allelism. There are three alleles for blood type (HBB gene) in humans: IA, IB, and i.

Incomplete Dominance

Sickle cell anemia, which is caused by a mutation in the gene…