Abstract
Whole-genome sequencing data of simplex families with autism spectrum disorder (ASD) were analyzed by searching for statistical interactions between loci. The resulting variant pairs mapped to 411 genes, of which 368 had not been associated with ASD before. The variants were used to build an ASD predictor based on an open-source machine learning library. The predictor correctly classifies over 78% of samples from a test set with an average significance level of 8.9· 10-158. Gene Ontology (GO) enrichment analysis of the identified risk genes points to functions related to the development of the Central Nervous System (CNS). Clustering cases on the basis of risk variants improves predictor accuracy and reveals additional overrepresented GO terms. Some of the detected statistical interactions can be linked to known biological interactions between genes involved in the development of the CNS. Analysis of the statistical interactions also points to genes whose biological functions are not yet known.