scholarly journals Deconvolving sequence features that discriminate between overlapping regulatory annotations

2017 ◽  
Author(s):  
Akshay Kakumanu ◽  
Silvia Velasco ◽  
Esteban Mazzoni ◽  
Shaun Mahony

AbstractGenomic loci with regulatory potential can be identified and annotated with various properties. For example, genomic sites may be annotated as being bound by a given transcription factor (TF) in one or more cell types. The same sites may be further labeled as being proximal or distal to known promoters. Given such a collection of labeled sites, it is natural to ask what sequence features are associated with each annotation label. However, discovering such label-specific sequence features is often confounded by overlaps between annotation labels; e.g. if regulatory sites specific to a given cell type are also more likely to be promoter-proximal, it is difficult to assess whether motifs identified in that set of sites are associated with the cell type or associated with promoters. In order to meet this challenge, we developed SeqUnwinder, a principled approach to deconvolving interpretable discriminative sequence features associated with overlapping annotation labels. We demonstrate the novel analysis abilities of SeqUnwinder using three examples. Firstly, we show SeqUnwinder’s ability to unravel sequence features associated with the dynamic binding behavior of TFs during motor neuron programming from features associated with chromatin state in the initial embryonic stem cells. Secondly, we characterize distinct sequence properties of multi-condition and cell-specific TF binding sites after controlling for uneven associations with promoter proximity. Finally, we demonstrate the scalability of SeqUnwinder to discover cell-specific sequence features from over one hundred thousand genomic loci that display DNase I hypersensitivity in one or more ENCODE cell lines.Availabilityhttps://github.com/seqcode/sequnwinder

2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

AbstractWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


2020 ◽  
Author(s):  
Yupeng Wang ◽  
Rosario Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

Abstract ObjectiveComputational identification of cell type-specific regulatory elements on a genome-wide scale is very challenging.ResultsWe propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, sequential k-mer (k=5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers including gkm-SVM and DanQ, with regard to distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL is able to directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified according to their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


2011 ◽  
Vol 2011 ◽  
pp. 1-8 ◽  
Author(s):  
Helen C. O'Neill ◽  
Kristin L. Griffiths ◽  
Pravin Periasamy ◽  
Rebecca A. Hinton ◽  
Ying-Ying Hey ◽  
...  

While spleen and other secondary tissue sites contribute to hematopoiesis, the nature of cells produced and the environment under which this happens are not fully defined. Evidence is reviewed here for hematopoiesis occurring in the spleen microenvironment leading to the production of tissue-specific antigen presenting cells. The novel dendritic-like cell identified in spleen is phenotypically and functionally distinct from other described antigen presenting cells. In order to identify these cells as distinct, it has been necessary to show that their lineage origin and progenitors differ from that of other known dendritic and myeloid cell types. The spleen therefore represents a distinct microenvironment for hematopoiesis of a novel myeloid cell arising from self-renewing hematopoietic stem cells (HSC) or progenitors endogenous to spleen.


Reproduction ◽  
2014 ◽  
Vol 147 (5) ◽  
pp. D1-D12 ◽  
Author(s):  
R Michael Roberts ◽  
Kyle M Loh ◽  
Mitsuyoshi Amita ◽  
Andreia S Bernardo ◽  
Katsuyuki Adachi ◽  
...  

It is imperative to unveil the full range of differentiated cell types into which human pluripotent stem cells (hPSCs) can develop. The need is twofold: it will delimit the therapeutic utility of these stem cells and is necessary to place their position accurately in the developmental hierarchy of lineage potential. Accumulated evidence suggested that hPSC could develop in vitro into an extraembryonic lineage (trophoblast (TB)) that is typically inaccessible to pluripotent embryonic cells during embryogenesis. However, whether these differentiated cells are truly authentic TB has been challenged. In this debate, we present a case for and a case against TB differentiation from hPSCs. By analogy to other differentiation systems, our debate is broadly applicable, as it articulates higher and more challenging standards for judging whether a given cell type has been genuinely produced from hPSC differentiation.


1996 ◽  
Vol 5 (2) ◽  
pp. 131-143 ◽  
Author(s):  
Jonathan Dinsmore ◽  
Judson Ratliff ◽  
Terry Deacon ◽  
Peyman Pakzaba ◽  
Douglas Jacoby ◽  
...  

The controlled differentiation of mouse embryonic stem (ES) cells into near homogeneous populations of both neurons and skeletal muscle cells that can survive and function in vivo after transplantation is reported. We show that treatment of pluripotent ES cells with retinoic acid (RA) and dimethylsulfoxide (DMSO) induce differentiation of these cells into highly enriched populations of γ-aminobutyric acid (GABA) expressing neurons and skeletal myoblasts, respectively. For neuronal differentiation, RA alone is sufficient to induce ES cells to differentiate into neuronal cells that show properties of postmitotic neurons both in vitro and in vivo. In vivo function of RA-induced neuronal cells was demonstrated by transplantation into the quinolinic acid lesioned striatum of rats (a rat model for Huntington's disease), where cells integrated and survived for up to 6 wk. The response of embryonic stem cells to DMSO to form muscle was less dramatic than that observed for RA. DMSO-induced ES cells formed mixed populations of muscle cells composed of cardiac, smooth, and skeletal muscle instead of homogeneous populations of a single muscle cell type. To determine whether the response of ES cells to DMSO induction could be further controlled, ES cells were stably transfected with a gene coding for the muscle-specific regulatory factor, MyoD. When induced with DMSO, ES cells constitutively expressing high levels of MyoD differentiated exclusively into skeletal myoblasts (no cardiac or smooth muscle cells) that fused to form myotubes capable of spontaneous contraction. Thus, the specific muscle cell type formed was controlled by the expression of MyoD. These results provided evidence that the specific cell type formed (whether it be muscle, neuronal, or other cell types) can be controlled in vitro. Further, these results demonstrated that ES cells can provide a source of multiple differentiated cell types that can be used for transplantation.


2018 ◽  
Vol 19 (11) ◽  
pp. 3609 ◽  
Author(s):  
Deepti Vipin ◽  
Lingfei Wang ◽  
Guillaume Devailly ◽  
Tom Michoel ◽  
Anagha Joshi

Transcription control plays a crucial role in establishing a unique gene expression signature for each of the hundreds of mammalian cell types. Though gene expression data have been widely used to infer cellular regulatory networks, existing methods mainly infer correlations rather than causality. We developed statistical models and likelihood-ratio tests to infer causal gene regulatory networks using enhancer RNA (eRNA) expression information as a causal anchor and applied the framework to eRNA and transcript expression data from the FANTOM Consortium. Predicted causal targets of transcription factors (TFs) in mouse embryonic stem cells, macrophages and erythroblastic leukaemia overlapped significantly with experimentally-validated targets from ChIP-seq and perturbation data. We further improved the model by taking into account that some TFs might act in a quantitative, dosage-dependent manner, whereas others might act predominantly in a binary on/off fashion. We predicted TF targets from concerted variation of eRNA and TF and target promoter expression levels within a single cell type, as well as across multiple cell types. Importantly, TFs with high-confidence predictions were largely different between these two analyses, demonstrating that variability within a cell type is highly relevant for target prediction of cell type-specific factors. Finally, we generated a compendium of high-confidence TF targets across diverse human cell and tissue types.


2014 ◽  
Vol 2014 ◽  
pp. 1-5 ◽  
Author(s):  
Anyou Wang ◽  
Yan Zhong ◽  
Yanhua Wang ◽  
Qianchuan He

Discriminating cell types is a daily request for stem cell biologists. However, there is not a user-friendly system available to date for public users to discriminate the common cell types, embryonic stem cells (ESCs), induced pluripotent stem cells (iPSCs), and somatic cells (SCs). Here, we develop WCTDS, a web-server of cell type discrimination system, to discriminate the three cell types and their subtypes like fetal versus adult SCs. WCTDS is developed as a top layer application of our recent publication regarding cell type discriminations, which employs DNA-methylation as biomarkers and machine learning models to discriminate cell types. Implemented by Django, Python, R, and Linux shell programming, run under Linux-Apache web server, and communicated through MySQL, WCTDS provides a friendly framework to efficiently receive the user input and to run mathematical models for analyzing data and then to present results to users. This framework is flexible and easy to be expended for other applications. Therefore, WCTDS works as a user-friendly framework to discriminate cell types and subtypes and it can also be expended to detect other cell types like cancer cells.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Yupeng Wang ◽  
Rosario B. Jaime-Lara ◽  
Abhrarup Roy ◽  
Ying Sun ◽  
Xinyue Liu ◽  
...  

Abstract Objective To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale. Results We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of “strong enhancer” chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Weilong Guo ◽  
Michael Q. Zhang ◽  
Hong Wu

Abstract Although non-CG methylations are abundant in several mammalian cell types, their biological significance is sparsely characterized. We gathered 51 human and mouse DNA methylomes from brain neurons, embryonic stem cells and induced pluripotent stem cells, primordial germ cells and oocytes. We utilized an unbiased sub-motif prediction method and reported CW as the representative non-CG methylation context, which is distinct from CC methylation in terms of sequence context and genomic distribution. A two-dimensional comparison of non-CG methylations across cell types and species was performed. Unambiguous studies of sequence preferences and genomic region enrichment showed that CW methylation is cell-type specific and is also conserved between humans and mice. In brain neurons, it was found that active long interspersed nuclear element-1 (LINE-1) lacked CW methylations but not CG methylations. Coincidentally, both human Alu and mouse B1 elements preferred high CW methylations at specific loci during their respective evolutionary development. Last, the strand-specific distributions of CW methylations in introns and long interspersed nuclear elements are also cell-type specific and conserved. In summary, our results illustrate that CW methylations are highly conserved among species, are dynamically regulated in each cell type, and are potentially involved in the evolution of transposon elements.


2019 ◽  
Author(s):  
Peiyao A. Zhao ◽  
Takayo Sasaki ◽  
David M. Gilbert

ABSTRACTDNA replication in mammalian cells occurs in a defined temporal order during S phase, known as the replication timing (RT) programme. RT is developmentally regulated and correlated with chromatin conformation and local transcriptional potential. Here we present RT profiles of unprecedented temporal resolution in two human embryonic stem cell lines, human colon carcinoma line HCT116 as well as F1 subspecies hybrid mouse embryonic stem cells and their neural progenitor derivatives. Strong enrichment of nascent DNA in fine temporal windows reveals a remarkable degree of cell to cell conservation in replication timing and patterns of replication genome-wide. We identify 5 patterns of replication in all cell types, consistent with varying degrees of initiation efficiency. Zones of replication initiation were found throughout S phase and resolved to ~50kb precision. Temporal transition regions were resolved into segments of uni-directional replication punctuated with small zones of inefficient initiation. Small and large valleys of convergent replication were consistent with either termination or broadly distributed initiation, respectively. RT correlated with chromatin compartment across all cell types but correlations of initiation time to chromatin domain boundaries and histone marks were cell type specific. Haplotype phasing revealed previously unappreciated regions of allele-specific and alleleindependent asynchronous replication. Allele-independent asynchrony was associated with large transcribed genes that resemble common fragile sites. Altogether, these data reveal a remarkably deterministic temporal choreography of DNA replication in mammalian cells.Highly homogeneous replication landscape between cells in a populationInitiation zones resolved within constant timing and timing transition regionsActive histone marks enriched within early initiation zones while enrichment of repressive marks is cell type specific.Transcribed long genes replicate asynchronously.


Sign in / Sign up

Export Citation Format

Share Document