scholarly journals Accurate promoter and enhancer identification in 127 ENCODE and Roadmap Epigenomics cell types and tissues by GenoSTAN

2016 ◽  
Author(s):  
Benedikt Zacher ◽  
Margaux Michel ◽  
Björn Schwalb ◽  
Patrick Cramer ◽  
Achim Tresch ◽  
...  

AbstractAccurate maps of promoters and enhancers are required for understanding transcriptional regulation. Promoters and enhancers are usually mapped by integration of chromatin assays charting histone modifications, DNA accessibility, and transcription factor binding. However, current algorithms are limited by unrealistic data distribution assumptions. Here we propose GenoSTAN (Genomic STate ANnotation), a hidden Markov model overcoming these limitations. We map promoters and enhancers for 127 cell types and tissues from the ENCODE and Roadmap Epigenomics projects, today’s largest compendium of chromatin assays. Extensive benchmarks demonstrate that GenoSTAN consistently identifies promoters and enhancers with significantly higher accuracy than previous methods. Moreover, GenoSTAN-derived promoters and enhancers showed significantly higher enrichment of complex trait-associated genetic variants than current annotations. Altogether, GenoSTAN provides an easy-to-use tool to define promoters and enhancers in any system, and our annotation of human transcriptional cis-regulatory elements constitutes a rich resource for future research in biology and medicine.

Science ◽  
2020 ◽  
Vol 370 (6518) ◽  
pp. eaba7612 ◽  
Author(s):  
Silvia Domcke ◽  
Andrew J. Hill ◽  
Riza M. Daza ◽  
Junyue Cao ◽  
Diana R. O’Day ◽  
...  

The chromatin landscape underlying the specification of human cell types is of fundamental interest. We generated human cell atlases of chromatin accessibility and gene expression in fetal tissues. For chromatin accessibility, we devised a three-level combinatorial indexing assay and applied it to 53 samples representing 15 organs, profiling ~800,000 single cells. We leveraged cell types defined by gene expression to annotate these data and cataloged hundreds of thousands of candidate regulatory elements that exhibit cell type–specific chromatin accessibility. We investigated the properties of lineage-specific transcription factors (such as POU2F1 in neurons), organ-specific specializations of broadly distributed cell types (such as blood and endothelial), and cell type–specific enrichments of complex trait heritability. These data represent a rich resource for the exploration of in vivo human gene regulation in diverse tissues and cell types.


2007 ◽  
Vol 7 (1) ◽  
pp. 38-48 ◽  
Author(s):  
Oier Etxebeste ◽  
Min Ni ◽  
Aitor Garzia ◽  
Nak-Jung Kwon ◽  
Reinhard Fischer ◽  
...  

ABSTRACT The fungal colony is a complex multicellular unit consisting of various cell types and functions. Asexual spore formation (conidiation) is integrated through sensory and regulatory elements into the general morphogenetic plan, in which the activation of the transcription factor BrlA is the first determining step. A number of early regulatory elements acting upstream of BrlA (fluG and flbA-E) have been identified, but their functional relations remain to be further investigated. In this report we describe FlbB as a putative basic-zipper-type transcription factor restricted to filamentous fungi. FlbB accumulates at the hyphal apex during early vegetative growth but is later found in apical nuclei, suggesting that an activating modification triggers nuclear import. Moreover, proper temporal and quantitative expression of FlbB is a prerequisite for brlA transcription, and misscheduled overexpression inhibits conidiation. We also present evidence that FlbB activation results in the production of a second diffusible signal, acting downstream from the FluG factor, to induce conidiation.


2016 ◽  
Author(s):  
Rachel E. Gate ◽  
Christine S. Cheng ◽  
Aviva P. Aiden ◽  
Atsede Siba ◽  
Marcin Tabaka ◽  
...  

AbstractOver 90% of genetic variants associated with complex human traits map to non-coding regions, but little is understood about how they modulate gene regulation in health and disease. One possible mechanism is that genetic variants affect the activity of one or more cis-regulatory elements leading to gene expression variation in specific cell types. To identify such cases, we analyzed Assay for Transposase-Accessible Chromatin sequencing (ATAC-seq) and RNA-seq profiles from activated CD4+ T cells of up to 105 healthy donors. We found that regions of accessible chromatin (ATAC-peaks) are co-accessible at kilobase and megabase resolution, in patterns consistent with the 3D organization of chromosomes measured by in situ Hi-C in T cells. 15% of genetic variants located within ATAC-peaks affected the accessibility of the corresponding peak through disrupting binding sites for transcription factors important for T cell differentiation and activation. These ATAC quantitative trait nucleotides (ATAC-QTNs) have the largest effects on co-accessible peaks, are associated with gene expression from the same aliquot of cells, are rarely affecting core binding motifs, and are enriched for autoimmune disease variants. Our results provide insights into how natural genetic variants modulate cis- regulatory elements, in isolation or in concert, to influence gene expression in primary immune cells that play a key role in many human diseases.


Author(s):  
Chaitanya Srinivasan ◽  
BaDoi N. Phan ◽  
Alyssa J. Lawler ◽  
Easwaran Ramamurthy ◽  
Michael Kleyman ◽  
...  

ABSTRACTRecent large genome-wide association studies (GWAS) have identified multiple confident risk loci linked to addiction-associated behavioral traits. Genetic variants linked to addiction-associated traits lie largely in non-coding regions of the genome, likely disrupting cis-regulatory element (CRE) function. CREs tend to be highly cell type-specific and may contribute to the functional development of the neural circuits underlying addiction. Yet, a systematic approach for predicting the impact of risk variants on the CREs of specific cell populations is lacking. To dissect the cell types and brain regions underlying addiction-associated traits, we applied LD score regression to compare GWAS to genomic regions collected from human and mouse assays for open chromatin, which is associated with CRE activity. We found enrichment of addiction-associated variants in putative regulatory elements marked by open chromatin in neuronal (NeuN+) nuclei collected from multiple prefrontal cortical areas and striatal regions known to play major roles in reward and addiction. To further dissect the cell type-specific basis of addiction-associated traits, we also identified enrichments in human orthologs of open chromatin regions of mouse neuron subtypes: cortical excitatory, PV, D1, and D2. Lastly, we developed machine learning models from mouse cell type-specific regions of open chromatin to further dissect human NeuN+ open chromatin regions into cortical excitatory or striatal D1 and D2 neurons and predict the functional impact of addiction-associated genetic variants. Our results suggest that different neuron subtypes within the reward system play distinct roles in the variety of traits that contribute to addiction.Significance StatementOur study on cell types and brain regions contributing to heritability of addiction-associated traits suggests that the conserved non-coding regions within cortical excitatory and striatal medium spiny neurons contribute to genetic predisposition for nicotine, alcohol, and cannabis use behaviors. This computational framework can flexibly integrate epigenomic data across species to screen for putative causal variants in a cell type- and tissue-specific manner across numerous complex traits.


2021 ◽  
Author(s):  
Tyler Hansen ◽  
Emily Hodges

Transcriptional enhancers control cell-type specific gene expression in humans and dysfunction can lead to debilitating diseases, including cancer. Identifying bona-fide enhancers is difficult due to a lack of spatial or sequence constraints. In addition, only a small percentage of the genome is accessible in matured cell types; and therefore, most enhancers are inactive due to their chromatin context rather than intrinsic properties of the DNA sequence itself. For this reason, we decided to assay regulatory activity exclusively within accessible chromatin. To do this, we combined assay for transposase-accessible chromatin using sequencing (ATAC-seq) with self-transcribing active regulatory region sequencing (STARR-seq); we call this method ATAC-STARR-seq. With ATAC-STARR-seq, we identify both active and silent regulatory elements in GM12878 B cells; these active and silent elements are enriched for transcription factor motifs and histone modifications associated with activating and repressing regulation, respectively. We also show that ATAC-STARR-seq quantifies chromatin accessibility and transcription factor binding. We integrate this information and subset active regions based on transcription factor binding profiles. Depending on the transcription factors bound, subsets are enriched for distinct reactome pathways. Altogether, this highlights the power of ATAC-STARR-seq to investigate the transcriptional regulatory landscape of the human genome.


2018 ◽  
Author(s):  
Tiffany Amariuta ◽  
Yang Luo ◽  
Steven Gazal ◽  
Emma E. Davenport ◽  
Bryce van de Geijn ◽  
...  

Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures at sites where specific transcription factors (TFs) are bound. To link these two identifying features, we introduce IMPACT, a genome annotation strategy which identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT predicts TF motif binding with high accuracy (average AUC 0.92, s.e. 0.03; across 8 TFs), a significant improvement (all p<6.9e-15) over intersecting motifs with open chromatin (average AUC 0.66, s.e. 0.11). Second, an IMPACT annotation trained on RNA polymerase II is more enriched for peripheral blood cis-eQTL variation (N=3,754) than sequence based annotations, such as promoters and regions around the TSS, (permutation p<1e-3, 25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N=38,242) and East Asian (N=22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% (s.e. 19.4%) of RA h2 (p<1.6e-5) and that the top 9.8% of Treg IMPACT regulatory elements, consisting of all SNPs with a non-zero annotation value, capture 97.3% (s.e. 18.2%) of RA h2 (p<7.6e-7), the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Finally, integration with RA fine-mapping data (N=27,345) revealed a significant enrichment (2.87, p<8.6e-3) of putatively causal variants across 20 RA associated loci in the top 1% of CD4+ Treg IMPACT regulatory regions. Overall, we find that IMPACT generalizes well to other cell types in identifying complex trait associated regulatory elements.


2021 ◽  
Vol 25 (1) ◽  
pp. 18-29
Author(s):  
E. V. Ignatieva ◽  
E. A. Matrosova

Whole genome and whole exome sequencing technologies play a very important role in the studies of the genetic aspects of the pathogenesis of various diseases. The ample use of genome-wide and exome-wide association study methodology (GWAS and EWAS) made it possible to identify a large number of genetic variants associated with diseases. This information is accumulated in the databases like GWAS central, GWAS catalog, OMIM, ClinVar, etc. Most of the variants identified by the GWAS technique are located in the noncoding regions of the human genome. According to the ENCODE project, the fraction of regions in the human genome potentially involved in transcriptional control is many times greater than the fraction of coding regions. Thus, genetic variation in noncoding regions of the genome can increase the susceptibility to diseases by disrupting various regulatory elements (promoters, enhancers, silencers, insulator regions, etc.). However, identification of the mechanisms of influence of pathogenic genetic variants on the diseases risk is difficult due to a wide variety of regulatory elements. The present review focuses on the molecular genetic mechanisms by which pathogenic genetic variants affect gene expression. At the same time, attention is concentrated on the transcriptional level of regulation as an initial step in the expression of any gene. A triggering event mediating the effect of a pathogenic genetic variant on the level of gene expression can be, for example, a change in the functional activity of transcription factor binding sites (TFBSs) or DNA methylation change, which, in turn, affects the functional activity of promoters or enhancers. Dissecting the regulatory roles of polymorphic loci have been impossible without close integration of modern experimental approaches with computer analysis of a growing wealth of genetic and biological data obtained using omics technologies. The review provides a brief description of a number of the most well-known public genomic information resources containing data obtained using omics technologies, including (1) resources that accumulate data on the chromatin states and the regions of transcription factor binding derived from ChIP-seq experiments; (2) resources containing data on genomic loci, for which allele-specific transcription factor binding was revealed based on ChIP-seq technology; (3) resources containing in silico predicted data on the potential impact of genetic variants on the transcription factor binding sites.


2017 ◽  
Author(s):  
Blue B. Lake ◽  
Song Chen ◽  
Brandon C. Sos ◽  
Jean Fan ◽  
Yun Yung ◽  
...  

AbstractDetailed characterization of the cell types comprising the highly complex human brain is essential to understanding its function. Such tasks require highly scalable experimental approaches to examine different aspects of the molecular state of individual cells, as well as the computational integration to produce unified cell state annotations. Here we report the development of two highly scalable methods (snDrop-Seq and scTHS-Seq), that we have used to acquire nuclear transcriptome and DNA accessibility maps for thousands of single cells from the human adult visual and frontal cortex. This has led to the best-resolved human neuronal subtypes to date, identification of a majority of the non-neuronal cell types, as well as the cell-type specific nuclear transcriptome and DNA accessibility maps. Integrative analysis allowed us to identify transcription factors and regulatory elements shaping the state of different brain cell types, and to map genetic risk factors of human brain common diseases to specific pathogenic cell types and subtypes.


2016 ◽  
Author(s):  
J. Kenneth Baillie ◽  
Andrew Bretherick ◽  
Christopher S. Haley ◽  
Sara Clohisey ◽  
Alan Gray ◽  
...  

AbstractGenetic variants underlying complex traits, including disease susceptibility, are enriched within the transcriptional regulatory elements, promoters and enhancers. There is emerging evidence that regulatory elements associated with particular traits or diseases share patterns of transcriptional regulation. Accordingly, shared transcriptional regulation (coexpression) may help prioritise loci associated with a given trait, and help to identify the biological processes underlying it. Using cap analysis of gene expression (CAGE) profiles of promoter and enhancer-derived RNAs across 1824 human samples, we have quantified coexpression of RNAs originating from trait-associated regulatory regions using a novel analytical method (network density analysis; NDA). For most traits studied, sequence variants in regulatory regions were linked to tightly coexpressed networks that are likely to share important functional characteristics. These networks implicate particular cell types and tissues in disease pathogenesis; for example, variants associated with ulcerative colitis are linked to expression in gut tissue, whereas Crohn’s disease variants are restricted to immune cells. We show that this coexpression signal provides additional independent information for fine mapping likely causative variants. This approach identifies additional genetic variants associated with specific traits, including an association between the regulation of the OCT1 cation transporter and genetic variants underlying circulating cholesterol levels. This approach enables a deeper biological understanding of the causal basis of complex traits.ONE SENTENCE SUMMARYWe discover that variants associated with a specific disease share expression profiles across tissues and cell types, enabling fine mapping and identification of new disease-associated variants, illuminating key cell types involved in disease pathogenesis.


2020 ◽  
Author(s):  
Michelle R. Jones ◽  
Pei-Chen Peng ◽  
Simon G. Coetzee ◽  
Jonathan Tyrer ◽  
Alberto L. Reyes ◽  
...  

AbstractQuantifying the functional effects of complex disease risk variants can provide insights into mechanisms underlying disease biology. Genome wide association studies (GWAS) have identified 39 regions associated with risk of epithelial ovarian cancer (EOC). The vast majority of these variants lie in the non-coding genome, suggesting they mediate their function through the regulation of gene expression by their interaction with tissue specific regulatory elements (REs). In this study, by intersecting germline genetic risk data with regulatory landscapes of active chromatin in ovarian cancers and their precursor cell types, we first estimated the heritability explained by known common low penetrance risk alleles. The narrow sense heritability of both EOC overall and high grade serous ovarian cancer (HGSOCs) was estimated to be 5-6%. Partitioned SNP-heritability across broad functional categories indicated a significant contribution of regulatory elements to EOC heritability. We collated epigenomic profiling data for 77 cell and tissue types from public resources (Roadmap Epigenomics and ENCODE), and H3K27Ac ChIP-Seq data generated in 26 ovarian cancer-relevant cell types. We identified significant enrichment of risk SNPs in active REs marked by H3K27Ac in HGSOCs. To further investigate how risk SNPs in active REs influence predisposition to ovarian cancer, we used motifbreakR to predict the disruption of transcription factor binding sites. We identified 469 candidate causal risk variants in H3K27Ac peaks that break TF motifs (enrichment P-Value < 1×10−5 compared to control variants). The most frequently broken motif was REST (P-Value = 0.0028), which has been reported as both a tumor suppressor and an oncogene. These systematic functional annotations with epigenomic data highlight the specificity of the regulatory landscape and demonstrate functional annotation of germline risk variants is most informative when performed in highly relevant cell types.


Sign in / Sign up

Export Citation Format

Share Document