Motif signatures of transcribed enhancers

Mapping Intimacies ◽

10.1101/188557 ◽

2017 ◽

Author(s):

Dimitrios Kleftogiannis ◽

Haitham Ashoor ◽

Nikolaos Zarokanellos ◽

Vladimir B. Bajic

Keyword(s):

Gene Expression ◽

Mammalian Cells ◽

Cell Types ◽

Computational Method ◽

Sequence Motifs ◽

Cell Type ◽

Distinct Cell Type ◽

Enhancer Activity ◽

Tissue Specific ◽

Enhancer Sequence

ABSTRACTIn mammalian cells, transcribed enhancers (TrEn) play important roles in the initiation of gene expression and maintenance of gene expression levels in spatiotemporal manner. One of the most challenging questions in biology today is how the genomic characteristics of enhancers relate to enhancer activities. This is particularly critical, as several recent studies have linked enhancer sequence motifs to specific functional roles. To date, only a limited number of enhancer sequence characteristics have been investigated, leaving space for exploring the enhancers genomic code in a more systematic way. To address this problem, we developed a novel computational method, TELS, aimed at identifying predictive cell type/tissue specific motif signatures. We used TELS to compile a comprehensive catalog of motif signatures for all known TrEn identified by the FANTOM5 consortium across 112 human primary cells and tissues. Our results confirm that distinct cell type/tissue specific motif signatures characterize TrEn. These signatures allow discriminating successfully a) TrEn from random controls, proxy of non-enhancer activity, and b) cell type/tissue specific TrEn from enhancers expressed and transcribed in different cell types/tissues. TELS codes and datasets are publicly available at http://www.cbrc.kaust.edu.sa/TELS.

Download Full-text

accuEnhancer: Accurate enhancer prediction by integration of multiple cell type data with deep learning

10.1101/2020.11.10.375717 ◽

2020 ◽

Author(s):

Yi-An Tung ◽

Wen-Tse Yang ◽

Tsung-Ting Hsieh ◽

Yu-Chuan Chang ◽

June-Tai Wu ◽

...

Keyword(s):

Deep Learning ◽

Cell Types ◽

Regulatory Elements ◽

Sequence Motifs ◽

Cell Type ◽

Enhancer Activity ◽

Multiple Cell ◽

Type Data ◽

Different Cell Types ◽

Multiple Cell Type

AbstractEnhancers are one class of the regulatory elements that have been shown to act as key components to assist promoters in modulating the gene expression in living cells. At present, the number of enhancers as well as their activities in different cell types are still largely unclear. Previous studies have shown that enhancer activities are associated with various functional data, such as histone modifications, sequence motifs, and chromatin accessibilities. In this study, we utilized DNase data to build a deep learning model for predicting the H3K27ac peaks as the active enhancers in a target cell type. We propose joint training of multiple cell types to boost the model performance in predicting the enhancer activities of an unstudied cell type. The results demonstrated that by incorporating more datasets across different cell types, the complex regulatory patterns could be captured by deep learning models and the prediction accuracy can be largely improved. The analyses conducted in this study demonstrated that the cell type-specific enhancer activity can be predicted by joint learning of multiple cell type data using only DNase data and the primitive sequences as the input features. This reveals the importance of cross-cell type learning, and the constructed model can be applied to investigate potential active enhancers of a novel cell type which does not have the H3K27ac modification data yet.AvailabilityThe accuEnhancer package can be freely accessed at: https://github.com/callsobing/accuEnhancer

Download Full-text

Comprehensive characterization of tissue-specific chromatin accessibility in L2 Caenorhabditis elegans nematodes

10.1101/2020.09.15.299123 ◽

2020 ◽

Author(s):

Timothy J. Durham ◽

Riza M. Daza ◽

Louis Gevirtzman ◽

Darren A. Cusanovich ◽

William Stafford Noble ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Expression Patterns ◽

Cell Types ◽

Chromatin Accessibility ◽

Gene Expression Patterns ◽

Rna Seq ◽

Cell Type ◽

Tissue Specific ◽

C Elegans

AbstractRecently developed single cell technologies allow researchers to characterize cell states at ever greater resolution and scale. C. elegans is a particularly tractable system for studying development, and recent single cell RNA-seq studies characterized the gene expression patterns for nearly every cell type in the embryo and at the second larval stage (L2). Gene expression patterns are useful for learning about gene function and give insight into the biochemical state of different cell types; however, in order to understand these cell types, we must also determine how these gene expression levels are regulated. We present the first single cell ATAC-seq study in C. elegans. We collected data in L2 larvae to match the available single cell RNA-seq data set, and we identify tissue-specific chromatin accessibility patterns that align well with existing data, including the L2 single cell RNA-seq results. Using a novel implementation of the latent Dirichlet allocation algorithm, we leverage the single-cell resolution of the sci-ATAC-seq data to identify accessible loci at the level of individual cell types, providing new maps of putative cell type-specific gene regulatory sites, with promise for better understanding of cellular differentiation and gene regulation in the worm.

Download Full-text

PESCA: A scalable platform for the development of cell-type-specific viral drivers

10.1101/570895 ◽

2019 ◽

Cited By ~ 3

Author(s):

Sinisa Hrvatin ◽

Christopher P. Tzeng ◽

M. Aurel Nagy ◽

Hume Stroud ◽

Charalampia Koutsioumpa ◽

...

Keyword(s):

Gene Expression ◽

Heterologous Gene Expression ◽

Single Cells ◽

Cell Types ◽

Regulatory Elements ◽

Functional Evaluation ◽

Cell Type ◽

Cell Type Specificity ◽

Enhancer Activity ◽

Cell Type Specific

AbstractEnhancers are the primary DNA regulatory elements that confer cell type specificity of gene expression. Recent studies characterizing individual enhancers have revealed their potential to direct heterologous gene expression in a highly cell-type-specific manner. However, it has not yet been possible to systematically identify and test the function of enhancers for each of the many cell types in an organism. We have developed PESCA, a scalable and generalizable method that leverages ATAC- and single-cell RNA-sequencing protocols, to characterize cell-type-specific enhancers that should enable genetic access and perturbation of gene function across mammalian cell types. Focusing on the highly heterogeneous mammalian cerebral cortex, we apply PESCA to find enhancers and generate viral reagents capable of accessing and manipulating a subset of somatostatin-expressing cortical interneurons with high specificity. This study demonstrates the utility of this platform for developing new cell-type-specific viral reagents, with significant implications for both basic and translational research.One sentence summaryHighly paralleled functional evaluation of enhancer activity in single cells generates new cell-type-specific tools with broad medical and scientific applications.

Download Full-text

Causal gene regulatory network inference using enhancer activity as a causal anchor

10.1101/311167 ◽

2018 ◽

Author(s):

Deepti Vipin ◽

Lingfei Wang ◽

Guillaume Devailly ◽

Tom Michoel ◽

Anagha Joshi

Keyword(s):

Gene Expression ◽

Regulatory Networks ◽

Cell Types ◽

Expression Data ◽

Cell Type ◽

Causal Gene ◽

High Confidence ◽

Enhancer Activity ◽

Enhancer Rna ◽

Gene Regulatory

AbstractMotivationTranscription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data has been widely used to infer the cellular regulatory networks, the methods mainly infer correlations rather than causality. We propose that a causal inference framework successfully used for eQTL data can be extended to infer causal regulatory networks using enhancers as causal anchors and enhancer RNA expression as a readout of enhancer activity.ResultsWe developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukemia overlapped significantly with experimentally validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types.AvailabilityMethods have been implemented in the Findr software, available at https://github.com/lingfeiwang/[email protected], [email protected]

Download Full-text

A probabilistic gene expression barcode for annotation of cell-types from single cell RNA-seq data

10.1101/2020.01.05.895441 ◽

2020 ◽

Cited By ~ 1

Author(s):

Isabella N. Grabski ◽

Rafael A. Irizarry

Keyword(s):

Gene Expression ◽

Single Cell ◽

Latent Variable ◽

Cell Types ◽

Marker Genes ◽

Cell Type ◽

Variable Model ◽

Distinct Cell Type ◽

Distinct Cell ◽

Public Datasets

AbstractSingle-cell RNA sequencing (scRNA-seq) quantifies gene expression for individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the annotation of cells into known cell-types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type annotation methods. We find limitations with current approaches due to the reliance on known marker genes or from overfitting because of systematic differences between studies or batch effects. Here, we present a statistical approach that leverages public datasets to combine information across thousands of genes, uses a latent variable model to define cell-type-specific barcodes and account for batch effect variation, and probabilistically annotates cell-type identity. The barcoding approach also provides a new way to discover marker genes. Using a range of datasets, including those generated to represent imperfect real-world reference data, we demonstrate that our approach substantially outperforms current reference-based methods, in particular when predicting across studies. Our approach also demonstrates that current approaches based on unsupervised clustering lead to false discoveries related to novel cell-types.

Download Full-text

Tissue-specific ramp sequences correspond with increased gene expression in humans and SARS-CoV-2

10.21203/rs.3.rs-738082/v1 ◽

2021 ◽

Author(s):

Justin Miller ◽

Taylor Meurs ◽

Matthew Hodgman ◽

Benjamin Song ◽

Mark Ebbert ◽

...

Keyword(s):

Gene Expression ◽

Cell Types ◽

Regulatory Elements ◽

Targeted Therapeutics ◽

Cell Type ◽

Sequence Analyses ◽

Tissue Specific ◽

Cell Type Specific ◽

Viral Proliferation ◽

Increase Gene Expression

Abstract Translational ramp sequences are essential regulatory elements that have yet to be characterized in specific tissues. Ramp sequences increase gene expression by evenly spacing ribosomes and slowing initial translation. Therefore, the relative codon adaptiveness within different tissues changes the existence of a ramp sequence without altering the underlying genetic code. Here, we present the first comprehensive analysis of tissue and cell type-specific ramp sequences, and report 3,108 genes with ramp sequences that change between tissues and cell types. The Ramp Atlas (https://ramps.byu.edu/) is an accompanying web portal that allows researchers to query ramp sequences in 18,388 genes across 62 tissues and 66 cell types. We also identified seven SARS-CoV-2 genes and seven human SARS-CoV-2 entry factor genes with tissue-specific ramp sequences that may help explain viral proliferation within those tissues. We anticipate that The Ramp Atlas will facilitate future tissue-specific ramp sequence analyses to develop targeted therapeutics for human disease.

Download Full-text

Histone modifications form a cell-type-specific chromosomal bar code that persists through the cell cycle

Scientific Reports ◽

10.1038/s41598-021-82539-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

John A. Halsall ◽

Simon Andrews ◽

Felix Krueger ◽

Charlotte E. Rutledge ◽

Gabriella Ficz ◽

...

Keyword(s):

Gene Expression ◽

Cell Cycle ◽

Histone Modifications ◽

Expression Patterns ◽

Cell Types ◽

Cell Type ◽

Bar Code ◽

Genes Encoding ◽

Cell Type Specific ◽

Rolling Windows

AbstractChromatin configuration influences gene expression in eukaryotes at multiple levels, from individual nucleosomes to chromatin domains several Mb long. Post-translational modifications (PTM) of core histones seem to be involved in chromatin structural transitions, but how remains unclear. To explore this, we used ChIP-seq and two cell types, HeLa and lymphoblastoid (LCL), to define how changes in chromatin packaging through the cell cycle influence the distributions of three transcription-associated histone modifications, H3K9ac, H3K4me3 and H3K27me3. We show that chromosome regions (bands) of 10–50 Mb, detectable by immunofluorescence microscopy of metaphase (M) chromosomes, are also present in G1 and G2. They comprise 1–5 Mb sub-bands that differ between HeLa and LCL but remain consistent through the cell cycle. The same sub-bands are defined by H3K9ac and H3K4me3, while H3K27me3 spreads more widely. We found little change between cell cycle phases, whether compared by 5 Kb rolling windows or when analysis was restricted to functional elements such as transcription start sites and topologically associating domains. Only a small number of genes showed cell-cycle related changes: at genes encoding proteins involved in mitosis, H3K9 became highly acetylated in G2M, possibly because of ongoing transcription. In conclusion, modified histone isoforms H3K9ac, H3K4me3 and H3K27me3 exhibit a characteristic genomic distribution at resolutions of 1 Mb and below that differs between HeLa and lymphoblastoid cells but remains remarkably consistent through the cell cycle. We suggest that this cell-type-specific chromosomal bar-code is part of a homeostatic mechanism by which cells retain their characteristic gene expression patterns, and hence their identity, through multiple mitoses.

Download Full-text

EMeth: An EM algorithm for cell type decomposition based on DNA methylation data

Scientific Reports ◽

10.1038/s41598-021-84864-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hanyu Zhang ◽

Ruoyi Cai ◽

James Dai ◽

Wei Sun

Keyword(s):

Dna Methylation ◽

Tumor Cells ◽

T Regulatory Cells ◽

Simulated Data ◽

Cell Types ◽

Computational Method ◽

Methylation Data ◽

Cell Type ◽

A Cell ◽

Type Decomposition

AbstractWe introduce a new computational method named EMeth to estimate cell type proportions using DNA methylation data. EMeth is a reference-based method that requires cell type-specific DNA methylation data from relevant cell types. EMeth improves on the existing reference-based methods by detecting the CpGs whose DNA methylation are inconsistent with the deconvolution model and reducing their contributions to cell type decomposition. Another novel feature of EMeth is that it allows a cell type with known proportions but unknown reference and estimates its methylation. This is motivated by the case of studying methylation in tumor cells while bulk tumor samples include tumor cells as well as other cell types such as infiltrating immune cells, and tumor cell proportion can be estimated by copy number data. We demonstrate that EMeth delivers more accurate estimates of cell type proportions than several other methods using simulated data and in silico mixtures. Applications in cancer studies show that the proportions of T regulatory cells estimated by DNA methylation have expected associations with mutation load and survival time, while the estimates from gene expression miss such associations.

Download Full-text

Analysis of a tissue-specific enhancer: ARF6 regulates adipogenic gene expression

Molecular and Cellular Biology ◽

10.1128/mcb.12.3.1202-1208.1992 ◽

1992 ◽

Vol 12 (3) ◽

pp. 1202-1208

Author(s):

R A Graves ◽

P Tontonoz ◽

B M Spiegelman

Keyword(s):

Gene Expression ◽

Cultured Cells ◽

Mutational Analysis ◽

Cell Types ◽

Specific Gene ◽

Enhancer Activity ◽

Cis Acting ◽

Specific Gene Expression ◽

Nuclear Extracts ◽

Adipogenic Gene

The molecular basis of adipocyte-specific gene expression is not well understood. We have previously identified a 518-bp enhancer from the adipocyte P2 gene that stimulates adipose-specific gene expression in both cultured cells and transgenic mice. In this analysis of the enhancer, we have defined and characterized a 122-bp DNA fragment that directs differentiation-dependent gene expression in cultured preadipocytes and adipocytes. Several cis-acting elements have been identified and shown by mutational analysis to be important for full enhancer activity. One pair of sequences, ARE2 and ARE4, binds a nuclear factor (ARF2) present in extracts derived from many cell types. Multiple copies of these elements stimulate gene expression from a minimal promoter in preadipocytes, adipocytes, and several other cultured cell lines. A second pair of elements, ARE6 and ARE7, binds a separate factor (ARF6) that is detected only in nuclear extracts derived from adipocytes. The ability of multimers of ARE6 or ARE7 to stimulate promoter activity is strictly adipocyte specific. Mutations in the ARE6 sequence greatly reduce the activity of the 518-bp enhancer. These data demonstrate that several cis- and trans-acting components contribute to the activity of the adipocyte P2 enhancer and suggest that ARF6, a novel differentiation-dependent factor, may be a key regulator of adipogenic gene expression.

Download Full-text

Specificity of gene expression in adipocytes

Molecular and Cellular Biology ◽

10.1128/mcb.5.2.419-421.1985 ◽

1985 ◽

Vol 5 (2) ◽

pp. 419-421

Author(s):

K M Zezulak ◽

H Green

Keyword(s):

Gene Expression ◽

Cell Types ◽

Cell Type ◽

3T3 Cells ◽

Cell Type Specificity ◽

Number Of Genes ◽

Enhanced Expression ◽

Adipose Cells ◽

Distinctive Phenotype

During the differentiation of preadipose 3T3 cells into adipose cells, the mRNAs for three proteins increase strikingly in abundance. To determine the degree of cell-type specificity in the expression of these mRNAs, we estimated their abundances in several nonadipose tissues of the mouse. None of these mRNAs was strictly confined to adipocytes, but the ensemble of three mRNAs was rather specific to adipocytes. Insofar as is revealed by these three markers, the distinctive phenotype of adipocytes is the result of the enhanced expression of a number of genes, none of which is completely silent in all other cell types.

Download Full-text