scholarly journals Motif signatures of transcribed enhancers

2017 ◽  
Author(s):  
Dimitrios Kleftogiannis ◽  
Haitham Ashoor ◽  
Nikolaos Zarokanellos ◽  
Vladimir B. Bajic

ABSTRACTIn mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

2020 ◽  
Author(s):  
Yi-An Tung ◽  
Wen-Tse Yang ◽  
Tsung-Ting Hsieh ◽  
Yu-Chuan Chang ◽  
June-Tai Wu ◽  
...  

AbstractEnhancers are one class of the regulatory elements that have been shown to act as key components to assist promoters in modulating the gene expression in living cells. At present, the number of enhancers as well as their activities in different cell types are still largely unclear. Previous studies have shown that enhancer activities are associated with various functional data, such as histone modifications, sequence motifs, and chromatin accessibilities. In this study, we utilized DNase data to build a deep learning model for predicting the H3K27ac peaks as the active enhancers in a target cell type. We propose joint training of multiple cell types to boost the model performance in predicting the enhancer activities of an unstudied cell type. The results demonstrated that by incorporating more datasets across different cell types, the complex regulatory patterns could be captured by deep learning models and the prediction accuracy can be largely improved. The analyses conducted in this study demonstrated that the cell type-specific enhancer activity can be predicted by joint learning of multiple cell type data using only DNase data and the primitive sequences as the input features. This reveals the importance of cross-cell type learning, and the constructed model can be applied to investigate potential active enhancers of a novel cell type which does not have the H3K27ac modification data yet.AvailabilityThe accuEnhancer package can be freely accessed at: https://github.com/callsobing/accuEnhancer


2020 ◽  
Author(s):  
Timothy J. Durham ◽  
Riza M. Daza ◽  
Louis Gevirtzman ◽  
Darren A. Cusanovich ◽  
William Stafford Noble ◽  
...  

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.


2019 ◽  
Author(s):  
Sinisa Hrvatin ◽  
Christopher P. Tzeng ◽  
M. Aurel Nagy ◽  
Hume Stroud ◽  
Charalampia Koutsioumpa ◽  
...  

AbstractEnhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.One sentence summaryHighly paralleled functional evaluation of enhancer activity in single cells generates new cell-type-specific tools with broad medical and scientific applications.


2018 ◽  
Author(s):  
Deepti Vipin ◽  
Lingfei Wang ◽  
Guillaume Devailly ◽  
Tom Michoel ◽  
Anagha Joshi

AbstractMotivationTranscription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data has been widely used to infer the cellular regulatory networks, the methods mainly infer correlations rather than causality. We propose that a causal inference framework successfully used for eQTL data can be extended to infer causal regulatory networks using enhancers as causal anchors and enhancer RNA expression as a readout of enhancer activity.ResultsWe developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukemia overlapped significantly with experimentally validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types.AvailabilityMethods have been implemented in the Findr software, available at https://github.com/lingfeiwang/[email protected], [email protected]


Author(s):  
Isabella N. Grabski ◽  
Rafael A. Irizarry

AbstractSingle-cell RNA sequencing (scRNA-seq) quantifies gene expression for individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the annotation of cells into known cell-types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type annotation methods. We find limitations with current approaches due to the reliance on known marker genes or from overfitting because of systematic differences between studies or batch effects. Here, we present a statistical approach that leverages public datasets to combine information across thousands of genes, uses a latent variable model to define cell-type-specific barcodes and account for batch effect variation, and probabilistically annotates cell-type identity. The barcoding approach also provides a new way to discover marker genes. Using a range of datasets, including those generated to represent imperfect real-world reference data, we demonstrate that our approach substantially outperforms current reference-based methods, in particular when predicting across studies. Our approach also demonstrates that current approaches based on unsupervised clustering lead to false discoveries related to novel cell-types.


2021 ◽  
Author(s):  
Justin Miller ◽  
Taylor Meurs ◽  
Matthew Hodgman ◽  
Benjamin Song ◽  
Mark Ebbert ◽  
...  

Abstract Translational ramp sequences are essential regulatory elements that have yet to be characterized in specific tissues. Ramp sequences increase gene expression by evenly spacing ribosomes and slowing initial translation. Therefore, the relative codon adaptiveness within different tissues changes the existence of a ramp sequence without altering the underlying genetic code. Here, we present the first comprehensive analysis of tissue and cell type-specific ramp sequences, and report 3,108 genes with ramp sequences that change between tissues and cell types. The Ramp Atlas (https://ramps.byu.edu/) is an accompanying web portal that allows researchers to query ramp sequences in 18,388 genes across 62 tissues and 66 cell types. We also identified seven SARS-CoV-2 genes and seven human SARS-CoV-2 entry factor genes with tissue-specific ramp sequences that may help explain viral proliferation within those tissues. We anticipate that The Ramp Atlas will facilitate future tissue-specific ramp sequence analyses to develop targeted therapeutics for human disease.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
John A. Halsall ◽  
Simon Andrews ◽  
Felix Krueger ◽  
Charlotte E. Rutledge ◽  
Gabriella Ficz ◽  
...  

AbstractChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear. To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL), to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3. We show that chromosome regions (bands) of 10–50 Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. They comprise 1–5 Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely. We found little change between cell cycle phases, whether compared by 5 Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains. Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription. In conclusion, modified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1 Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hanyu Zhang ◽  
Ruoyi Cai ◽  
James Dai ◽  
Wei Sun

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.


1992 ◽  
Vol 12 (3) ◽  
pp. 1202-1208
Author(s):  
R A Graves ◽  
P Tontonoz ◽  
B M Spiegelman

The molecular basis of adipocyte-specific gene expression is not well understood. We have previously identified a 518-bp enhancer from the adipocyte P2 gene that stimulates adipose-specific gene expression in both cultured cells and transgenic mice. In this analysis of the enhancer, we have defined and characterized a 122-bp DNA fragment that directs differentiation-dependent gene expression in cultured preadipocytes and adipocytes. Several cis-acting elements have been identified and shown by mutational analysis to be important for full enhancer activity. One pair of sequences, ARE2 and ARE4, binds a nuclear factor (ARF2) present in extracts derived from many cell types. Multiple copies of these elements stimulate gene expression from a minimal promoter in preadipocytes, adipocytes, and several other cultured cell lines. A second pair of elements, ARE6 and ARE7, binds a separate factor (ARF6) that is detected only in nuclear extracts derived from adipocytes. The ability of multimers of ARE6 or ARE7 to stimulate promoter activity is strictly adipocyte specific. Mutations in the ARE6 sequence greatly reduce the activity of the 518-bp enhancer. These data demonstrate that several cis- and trans-acting components contribute to the activity of the adipocyte P2 enhancer and suggest that ARF6, a novel differentiation-dependent factor, may be a key regulator of adipogenic gene expression.


1985 ◽  
Vol 5 (2) ◽  
pp. 419-421
Author(s):  
K M Zezulak ◽  
H Green

During the differentiation of preadipose 3T3 cells into adipose cells, the mRNAs for three proteins increase strikingly in abundance. To determine the degree of cell-type specificity in the expression of these mRNAs, we estimated their abundances in several nonadipose tissues of the mouse. None of these mRNAs was strictly confined to adipocytes, but the ensemble of three mRNAs was rather specific to adipocytes. Insofar as is revealed by these three markers, the distinctive phenotype of adipocytes is the result of the enhanced expression of a number of genes, none of which is completely silent in all other cell types.


Sign in / Sign up

Export Citation Format

Share Document