Haplotype-based eQTL mapping finds evidence for complex gene regulatory regions poorly tagged by marginal SNPs
AbstractMotivationExpression quantitative trait loci (eQTLs), variations in the genome that impact gene expression, are identified through eQTL studies that test for a relationship between single nucleotide polymorphisms (SNPs) and gene expression levels. These studies typically assume an underlying additive model. Non-additive tests have been proposed, but are limited due to the increase in the multiple testing burden and are potentially biased by filtering criteria that relies on marginal association data. Here we propose using combinations of short haplotypes instead of SNPs as predictors for gene expression. Essentially, this method looks for genomic regions where haplotypes have different effect sizes. The differences in effect can be due to multiple genetic architectures such as a single SNP, a burden of rare SNPs, multiple SNPs with independent effect or multiple SNPs with an interaction effect occurring on the same haplotype.ResultsSimulations show that when haplotypes, rather than SNPs, are assigned non-zero effect sizes, our method has increased power compared to the marginal SNP method. In the GEUVADIS gene expression data, our method finds 101 more eGenes than the marginal method (5,202 vs. 5,101). The methods do not have full overlap in the eGenes that they find. Of the 5,202 eGenes found by our method, 707 are not found by the marginal method—even though it has a lower significance threshold. This indicates that many genes have regulatory architectures that are not well tagged by marginal SNPs and demonstrates the need to better model alternative archi-tectures.