Massive variation of short tandem repeats with functional consequences across strains of Arabidopsis thaliana
AbstractShort tandem repeat (STR) mutations may be responsible for more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assess the scope of this contribution across a collection of 96 strains of Arabidopsis thaliana by massively parallel STR genotyping. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR in these strains. Modest STR expansions are found in most strains, some of which have evident functional effects. For instance, three of six intronic STR expansions are associated with intron retention. Coding STRs are depleted of variation relative to non-coding STRs, consistent with the action of purifying selection, and some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detect dozens of novel STR-phenotype associations that could not be detected with SNPs alone, validating several with follow-up experiments. Our results demonstrate that STRs comprise a large, unascertained reservoir of functionally relevant genomic variation.