scholarly journals Evidence of transcription at polyT short tandem repeats

2019 ◽  
Author(s):  
Chloé Bessière ◽  
Manu Saraswat ◽  
Mathys Grapotte ◽  
Christophe Menichelli ◽  
Jordan A. Ramilowski ◽  
...  

AbstractBackgroundUsing the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers.ResultsHere, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines (T). Additional analyse confirm that these CAGEs are truly associated with transcriptionally active chromatin marks. Furthermore, we train a sequence-based deep learning model able to predict CAGE signal at T STRs with high accuracy (~81%) Extracting features learned by this model reveals that transcription at T STRs is mostly directed by STR length but also instructions lying in the downstream sequence. Excitingly, our model also predicts that genetic variants linked to human diseases affect this STR-associated transcription.ConclusionsTogether, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism. We also provide a new metric that can be considered in future studies of STR-related complex traits.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mathys Grapotte ◽  
Manu Saraswat ◽  
Chloé Bessière ◽  
Christophe Menichelli ◽  
Jordan A. Ramilowski ◽  
...  

AbstractUsing the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of transcription start sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we develop Cap Trap RNA-seq, a technology which combines cap trapping and long read MinION sequencing. We train sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveil the importance of STR surrounding sequences not only to distinguish STR classes, but also to predict the level of transcription initiation. Importantly, genetic variants linked to human diseases are preferentially found at STRs with high transcription initiation level, supporting the biological and clinical relevance of transcription initiation at STRs. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


2020 ◽  
Author(s):  
Mathys Grapotte ◽  
Manu Saraswat ◽  
Chloé Bessière ◽  
Christophe Menichelli ◽  
Jordan A. Ramilowski ◽  
...  

Using the Cap Analysis of Gene Expression (CAGE) technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~ 72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers. Here, we probed these unassigned TSSs and showed that, in all species studied, a significant fraction of CAGE peaks initiate at microsatellites, also called short tandem repeats (STRs). To confirm this transcription, we developed Cap Trap RNA-seq, a technology which combines cap trapping and long reads MinION sequencing. We trained sequence-based deep learning models able to predict CAGE signal at STRs with high accuracy. These models unveiled the importance of STR surrounding sequences not only to distinguish STR classes, as defined by the repeated DNA motif, one from each other, but also to predict their transcription. Excitingly, our models predicted that genetic variants linked to human diseases affect STR-associated transcription and correspond precisely to the key positions identified by our models to predict transcription. Together, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
David Jakubosky ◽  
◽  
Matteo D’Antonio ◽  
Marc Jan Bonder ◽  
Craig Smail ◽  
...  

2018 ◽  
Author(s):  
Shubham Saini ◽  
Ileena Mitra ◽  
Nima Mousavi ◽  
Stephanie Feupe Fotsing ◽  
Melissa Gymrek

AbstractShort tandem repeats (STRs) are involved in dozens of Mendelian disorders and have been implicated in a variety of complex traits. However, existing technologies focusing on single nucleotide polymorphisms (SNPs) have not allowed for systematic STR association studies. Here, we leverage next-generation sequencing data from 479 families to create a SNP+STR reference haplotype panel for genome-wide imputation of STRs into SNP data. Imputation achieved an average of 97% concordance between genotyped and imputed STR genotypes in an external dataset compared to 63% expected under a random model. Performance varied widely across STRs, with near perfect concordance at bi-allelic STRs vs. 70% at highly polymorphic forensics markers. We demonstrate that imputation increases power over individual SNPs to detect STR associations using simulated phenotypes and gene expression data. This resource will enable the first large-scale STR association studies using existing SNP datasets, and will likely yield new insights into complex traits.


2022 ◽  
Vol 13 (1) ◽  
Author(s):  
Zhongzi Wu ◽  
Huanfa Gong ◽  
Zhimin Zhou ◽  
Tao Jiang ◽  
Ziqi Lin ◽  
...  

Abstract Background Short tandem repeats (STRs) were recently found to have significant impacts on gene expression and diseases in humans, but their roles on gene expression and complex traits in pigs remain unexplored. This study investigates the effects of STRs on gene expression in liver tissues based on the whole-genome sequences and RNA-Seq data of a discovery cohort of 260 F6 individuals and a validation population of 296 F7 individuals from a heterogeneous population generated from crosses among eight pig breeds. Results We identified 5203 and 5868 significantly expression STRs (eSTRs, FDR < 1%) in the F6 and F7 populations, respectively, most of which could be reciprocally validated (π1 = 0.92). The eSTRs explained 27.5% of the cis-heritability of gene expression traits on average. We further identified 235 and 298 fine-mapped STRs through the Bayesian fine-mapping approach in the F6 and F7 pigs, respectively, which were significantly enriched in intron, ATAC peak, compartment A and H3K4me3 regions. We identified 20 fine-mapped STRs located in 100 kb windows upstream and downstream of published complex trait-associated SNPs, which colocalized with epigenetic markers such as H3K27ac and ATAC peaks. These included eSTR of the CLPB, PGLS, PSMD6 and DHDH genes, which are linked with genome-wide association study (GWAS) SNPs for blood-related traits, leg conformation, growth-related traits, and meat quality traits, respectively. Conclusions This study provides insights into the effects of STRs on gene expression traits. The identified eSTRs are valuable resources for prioritizing causal STRs for complex traits in pigs.


2019 ◽  
Author(s):  
David Jakubosky ◽  
Matteo D’Antonio ◽  
Marc Jan Bonder ◽  
Craig Smail ◽  
Margaret K.R. Donovan ◽  
...  

AbstractStructural variants (SVs) and short tandem repeats (STRs) comprise a broad group of diverse DNA variants which vastly differ in their sizes and distributions across the genome. Here, we show that different SV classes and STRs differentially impact gene expression and complex traits. Functional differences between SV classes and STRs include their genomic locations relative to eGenes, likelihood of being associated with multiple eGenes, associated eGene types (e.g., coding, noncoding, level of evolutionary constraint), effect sizes, linkage disequilibrium with tagging single nucleotide variants used in GWAS, and likelihood of being associated with GWAS traits. We also identified a set of high-impact SVs/STRs associated with the expression of three or more eGenes via chromatin loops and showed they are highly enriched for being associated with GWAS traits. Our study provides insights into the genomic properties of structural variant classes and short tandem repeats that impact gene expression and human traits.


2017 ◽  
Author(s):  
Maximilian O. Press ◽  
Rajiv C. McCoy ◽  
Ashley N. Hall ◽  
Joshua M. Akey ◽  
Christine Queitsch

AbstractShort tandem repeat (STR) mutations may be responsible for more than half of the mutations in eukaryotic coding DNA, yet STR variation is rarely examined as a contributor to complex traits. We assess the scope of this contribution across a collection of 96 strains of Arabidopsis thaliana by massively parallel STR genotyping. We found that 95% of examined STRs are polymorphic, with a median of six alleles per STR in these strains. Modest STR expansions are found in most strains, some of which have evident functional effects. For instance, three of six intronic STR expansions are associated with intron retention. Coding STRs are depleted of variation relative to non-coding STRs, consistent with the action of purifying selection, and some STRs show hypervariable patterns consistent with diversifying selection. Finally, we detect dozens of novel STR-phenotype associations that could not be detected with SNPs alone, validating several with follow-up experiments. Our results demonstrate that STRs comprise a large, unascertained reservoir of functionally relevant genomic variation.


2018 ◽  
Author(s):  
Stephanie Feupe Fotsing ◽  
Jonathan Margoliash ◽  
Catherine Wang ◽  
Shubham Saini ◽  
Richard Yanicky ◽  
...  

AbstractShort tandem repeats (STRs) have been implicated in a variety of complex traits in humans. However, genome-wide studies of the effects of STRs on gene expression thus far have had limited power to detect associations and provide insights into putative mechanisms. Here, we leverage whole genome sequencing and expression data for 17 tissues from the Genotype-Tissue Expression Project (GTEx) to identify STRs for which repeat number is associated with expression of nearby genes (eSTRs). Our analysis reveals more than 28,000 eSTRs. We employ fine-mapping to quantify the probability that each eSTR is causal and characterize a group of the top 1,400 fine-mapped eSTRs. We identify hundreds of eSTRs linked with published GWAS signals and implicate specific eSTRs in complex traits including height and schizophrenia, inflammatory bowel disease, and intelligence. Overall, our results support the hypothesis that eSTRs contribute to a range of human phenotypes and will serve as a valuable resource for future studies of complex traits.


Sign in / Sign up

Export Citation Format

Share Document