PrediTALE: A novel model learned from quantitative data allows for new perspectives on TALE targeting

Mapping Intimacies ◽

10.1101/522458 ◽

2019 ◽

Author(s):

Annett Erkes ◽

Stefanie Mücke ◽

Maik Reschke ◽

Jens Boch ◽

Jan Grau

Keyword(s):

Amino Acids ◽

Quantitative Data ◽

Target Genes ◽

Target Prediction ◽

Regulatory Function ◽

Rna Seq ◽

Transcription Activator ◽

Gene Annotations ◽

Novel Approach ◽

Genome Wide

AbstractPlant-pathogenic Xanthomonas bacteria secret transcription activator-like effectors (TALEs) into host cells, where they act as transcriptional activators on plant target genes to support bacterial virulence. TALEs have a unique modular DNA-binding domain composed of tandem repeats. Two amino acids within each tandem repeat, termed repeat-variable diresidues, bind to contiguous nucleotides on the DNA sequence and determine target specificity.In this paper, we propose a novel approach for TALE target prediction to identify potential virulence targets. Our approach accounts for recent findings concerning TALE targeting, including frame-shift binding by repeats of aberrant lengths, and the flexible strand orientation of target boxes relative to the transcription start of the downstream target gene. The computational model can account for dependencies between adjacent RVD positions. Model parameters are learned from the wealth of quantitative data that have been generated over the last years.We benchmark the novel approach, termed PrediTALE, using RNA-seq data after Xanthomonas infection in rice, and find an overall improvement of prediction performance compared with previous approaches. Using PrediTALE, we are able to predict several novel putative virulence targets. However, we also observe that no target genes are predicted by any prediction tool for several TALEs, which we term orphan TALEs for this reason. We postulate that one explanation for orphan TALEs are incomplete gene annotations and, hence, propose to replace promoterome-wide by genome-wide scans for target boxes. We demonstrate that known targets from promoterome-wide scans may be recovered by genome-wide scans, whereas the latter, combined with RNA-seq data, are able to detect putative targets independent of existing gene annotations.Author summaryDiseases caused by plant-pathogenic Xanthomonas bacteria are a serious threat for many important crop plants including rice. Efficiently protecting plants from these pathogens requires a deeper understanding of infection strategies. For many Xanthomonas strains, such infection strategies depend on a special class of effector proteins, termed transcription activator-like effectors (TALEs). TALEs may specifically activate genes of the host plant and, by this means, re-program the plant cell for the benefit of the pathogen. Target sequences and, consequently, target genes of a specific TALE may be predicted computationally from its amino acids. Here, we propose a novel approach for TALE target prediction that makes use of several insights into TALE biology but also of broad experimental data gained over the last years. We demonstrate that this approach yields a higher prediction accuracy than previous approaches. We further postulate that a strategy change from a restricted search only considering promoters of annotated genes to a broad genome-wide search is feasible and yields novel targets including previously neglected protein-coding genes but also non-coding RNAs of possibly regulatory function.

Download Full-text

Poly(A)-ClickSeq: click-chemistry for next-generation 3´-end sequencing without RNA enrichment or fragmentation

10.1101/109272 ◽

2017 ◽

Cited By ~ 1

Author(s):

Andrew Routh ◽

Ping Ji ◽

Elizabeth Jaworski ◽

Zheng Xia ◽

Wei Li ◽

...

Keyword(s):

Click Chemistry ◽

Simple Procedure ◽

Alternative Polyadenylation ◽

Rna Seq ◽

Cdna Synthesis ◽

Novel Approach ◽

Genome Wide ◽

Recent Emergence ◽

Complex Sample ◽

A Site

AbstractThe recent emergence of alternative polyadenylation (APA) as an engine driving transcriptomic diversity has stimulated the development of sequencing methodologies designed to assess genome-wide polyadenylation events. The goal of these approaches is to enrich, partition, capture, and ultimately sequence poly(A) site junctions. However, these methods often require poly(A) enrichment, 3´ linker ligation steps, and RNA fragmentation, which can necessitate higher levels of starting RNA, increase experimental error, and potentially introduce bias. We recently reported a click-chemistry based method for generating RNAseq libraries called “ClickSeq”. Here, we adapt this method to direct the cDNA synthesis specifically toward the 3´ UTR/poly(A) tail junction of cellular RNA. With this novel approach, we demonstrate sensitive and specific enrichment for poly(A) site junctions without the need for complex sample preparation, fragmentation or purification. Poly(A)-ClickSeq (PAC-seq) is therefore a simple procedure that generates high-quality RNA-seq poly(A) libraries. As a proof-of-principle, we utilized PAC-seq to explore the poly(A) landscape of both human and Drosophila cells in culture and observed outstanding overlap with existing poly(A) databases and also identified previously unannotated poly(A) sites. Moreover, we utilize PAC-seq to quantify and analyze APA events regulated by CFIm25 illustrating how this technology can be harnessed to identify alternatively polyadenylated RNA.

Download Full-text

Genome-wide identification of novel long non-coding RNAs and their possible roles in hypoxic zebrafish brain

10.1101/2020.07.10.181842 ◽

2020 ◽

Author(s):

Bodhisattwa Banerjee ◽

Debaprasad Koner ◽

David Karasik ◽

Nirmalendu Saha

Keyword(s):

Target Genes ◽

Neuronal Development ◽

Rna Seq ◽

Master Regulators ◽

Expression Studies ◽

Genome Wide ◽

Development And Differentiation ◽

Non Coding Rnas ◽

Differentiation Pathways ◽

Syntenic Regions

AbstractLong non-coding RNAs (lncRNAs) are the master regulators of numerous biological processes. Hypoxia causes oxidative stress with severe and detrimental effects on brain function and acts as a critical initiating factor in the pathogenesis of Alzheimer’s disease (AD). From the RNA-Seq in the forebrain (Fb), midbrain (Mb), and hindbrain (Hb) regions of hypoxic and normoxic zebrafish, we identified novel lncRNAs, whose potential cis targets showed involvement in neuronal development and differentiation pathways. Under hypoxia, several lncRNAs and mRNAs were differentially expressed. Co-expression studies indicated that the Fb and Hb regions’ potential lncRNA target genes were involved in the AD pathogenesis. In contrast, those in Mb (cry1b, per1a, cipca) were responsible for regulating circadian rhythm. We identified specific lncRNAs present in the syntenic regions between zebrafish and humans, possibly functionally conserved. We thus identified several conserved lncRNAs as the probable regulators of AD genes (adrb3b, cav1, stat3, bace2, apoeb, psen1, s100b).

Download Full-text

SEQing: web-based visualization of iCLIP and RNA-seq data in an interactive python framework

10.1101/2019.12.17.865873 ◽

2019 ◽

Author(s):

Martin Lewinski ◽

Yannik Bramkamp ◽

Tino Köster ◽

Dorothee Staiger

Keyword(s):

Binding Sites ◽

Binding Proteins ◽

Target Genes ◽

Rna Binding ◽

Rna Binding Proteins ◽

Rna Seq ◽

Web Based ◽

Genome Wide ◽

Functional Relevance ◽

Nucleotide Resolution

AbstractBackgroundRNA-binding proteins interact with their target RNAs at specific sites. These binding sites can be determined genome-wide through individual nucleotide resolution crosslinking immunoprecipitation (iCLIP). Subsequently, the binding sites have to be visualized. So far, no visualization tool exists that is easily accessible but also supports restricted access so that data can be shared among collaborators.ResultsHere we present SEQing, a customizable interactive dashboard to visualize crosslink sites on target genes of RNA-binding proteins that have been obtained by iCLIP. Moreover, SEQing supports RNA-seq data that can be displayed in a diffrerent window tab. This allows, e.g. crossreferencing the iCLIP data with genes differentially expressed in mutants of the RBP and thus obtain some insights into a potential functional relevance of the binding sites. Additionally, detailed information on the target genes can be incorporated in another tab.ConclusionSEQing is written in Python3 and runs on Linux. The web-based access makes iCLIP data easily accessible, even with mobile devices. SEQing is customizable in many ways and has also the option to be secured by a password. The source code is available at https://github.com/malewins/SEQing.

Download Full-text

Chromatin-based techniques map DNA interaction landscapes in psoriasis susceptibility loci and highlight KLF4 as a target gene in 9q31

10.1101/822304 ◽

2019 ◽

Author(s):

Helen Ray-Jones ◽

Kate Duffus ◽

Amanda McGovern ◽

Paul Martin ◽

Chenfu Shi ◽

...

Keyword(s):

Target Gene ◽

Target Genes ◽

Association Studies ◽

Dna Interaction ◽

Genome Wide Association Studies ◽

Susceptibility Loci ◽

Rna Seq ◽

Causal Gene ◽

Genome Wide ◽

Crispr Activation

AbstractGenome-wide association studies (GWAS) have uncovered many genetic risk loci for psoriasis, yet many remain uncharacterised in terms of the causal gene and their biological mechanism in disease. Here, we use a disease-focused Capture Hi-C experiment to link psoriasis-associated variants with their target genes in psoriasis-relevant cell lines (HaCaT keratinocytes and My-La CD8+ T cells). We confirm previously assigned genes, suggest novel candidates and provide evidence for complexity at psoriasis GWAS loci. In the 9q31 risk locus we combine further epigenomic evidence to demonstrate how the psoriasis association forms a functional interaction with the distant (>500 kb) KLF4 gene. We use CRISPR activation coupled with RNA-seq to demonstrate how activation of psoriasis-associated enhancers upregulates KLF4 in HaCaT cells. Our study design provides a robust pipeline for following up on GWAS disease-associated variants, paving the way for functional translation of genetic findings into clinical benefit.

Download Full-text

Genome-Wide Identification of Long Non-Coding RNAs and Their Potential Functions in Poplar Growth and Phenylalanine Biosynthesis

Frontiers in Genetics ◽

10.3389/fgene.2021.762678 ◽

2021 ◽

Vol 12 ◽

Author(s):

Lei Zhang ◽

Xiaolan Ge ◽

Jiujun Du ◽

Xingqi Cheng ◽

Xiaopeng Peng ◽

...

Keyword(s):

Target Genes ◽

Expression Patterns ◽

Wood Quality ◽

Target Prediction ◽

Wood Formation ◽

Sequencing Analysis ◽

Oxidoreductase Activity ◽

Regulatory Processes ◽

Genome Wide ◽

Phenylpropanoid Biosynthesis

Poplar is an important bioenergy tree species. lncRNAs play important roles in various biological regulatory processes, and their expression pattern is more tissue-specific than mRNAs. In this study, P. deltoides “Danhong” (Pd) and P. simonii “Tongliao1” (Ps) with different growth rates and wood quality were used as experimental materials, and the transcriptomes of their shoot apical meristem, xylem, and phloem were sequenced. Furthermore, high-throughput RNA sequencing analysis revealed that the expression patterns of genes and lncRNAs are different between the two genotypes. 6,355 lncRNAs were identified. Based on target prediction, lncRNAs and target genes were involved in ADP binding, oxidoreductase activity, phenylpropanoid biosynthesis, and cyanoamino acid metabolism. The DElncRNAs in two poplars were co-expressed with transcription factors and structural genes of lignin and flavonoid pathways. In addition, we found the potential target lncRNAs of miRNA. This result provides basic evidence for a better understanding of the regulatory role of lncRNAs in regulating phenylalanine molecular pathways and wood formation.

Download Full-text

Identification of microRNA-21 target genes associated with hair follicle development in sheep

PeerJ ◽

10.7717/peerj.7167 ◽

2019 ◽

Vol 7 ◽

pp. e7167 ◽

Cited By ~ 2

Author(s):

Bo Zhai ◽

Lichun Zhang ◽

Chunxin Wang ◽

Zhuo Zhao ◽

Mingxin Zhang ◽

...

Keyword(s):

Hair Follicle ◽

Target Genes ◽

Target Prediction ◽

Significant Negative Correlation ◽

Hair Follicles ◽

Regulatory Function ◽

Follicle Development ◽

Extreme Phenotypes ◽

Hair Follicle Development ◽

New Research

Aim The target molecule regulatory function of microRNA-21 (miR-21) in multiple signalling pathways has become a main focus of genetic and pharmacological regulatory studies of various diseases. The identification of target genes for miRNA-21 in the development of hair follicles can provide new research pathways for the regulation of cell development. Methods In the present study, eight six-month-old ewes from Super Merino (SM) and Small Tailed Han (STH) sheep breeds were selected. Target prediction and dual-luciferase wild-type and mutant vectors were used to identify the target genes of miR-21. Quantitative reverse transcription polymerase chain reaction (RT-qPCR) and bioinformatics analysis were conducted to analyze the effects of miR-21. Results The results show that the expressions of CNKSR2, KLF3 and TNPO1 were downregulated by miRNA-21 at rates of 36%, 26% and 48%, respectively. Moreover, there was a significant negative correlation between the expression of miR-21 and the three target genes in sheep with two extreme phenotypes. The expression of microRNA-21in October was significantly lower than that in January and February; while the expression of CNKSR2, KLF3 and TNPO1 in October was higher than that in January and February. Conclusions: These results suggest that CNKSR2, KLF3 and TNPO1 are three newly discovered target genes of miR-21 and might be involved in the effects of miR-21 on hair follicle development.

Download Full-text

LETR1 is a lymphatic endothelial-specific lncRNA that governs cell proliferation and migration through KLF4 and SEMA3C

10.1101/2020.05.25.114546 ◽

2020 ◽

Author(s):

Luca Ducoli ◽

Saumya Agrawal ◽

Eliane Sibler ◽

Tsukasa Kouno ◽

Carlotta Tacconi ◽

...

Keyword(s):

Transcriptional Control ◽

Target Genes ◽

Lymphatic Endothelial Cells ◽

Rna Seq ◽

Ample Evidence ◽

Genome Wide ◽

And Migration ◽

First Time ◽

New Evidence ◽

Proliferation And Migration

AbstractRecent studies have revealed the importance of long noncoding RNAs (lncRNAs) as tissue-specific regulators of gene expression. There is ample evidence that distinct types of vasculature undergo tight transcriptional control to preserve their structure, identity, and functions. We determined, for the first time, the global lineage-specific lncRNAome of human dermal blood and lymphatic endothelial cells (BECs and LECs), combining RNA-Seq and CAGE-Seq. A subsequent genome-wide antisense oligonucleotide-knockdown screen of a robust set of BEC- and LEC-specific lncRNAs identified LETR1 as a critical gatekeeper of the global LEC transcriptome. Deep RNA-DNA, RNA-protein, and phenotype rescue analyses revealed that LETR1 acts as a nuclear trans-acting lncRNA modulating, via key epigenetic factors, the expression of essential target genes, including KLF4 and SEMA3C, governing the growth and migratory ability of LECs. Together, our study provides new evidence supporting the intriguing concept that every cell type expresses precise lncRNA signatures to control lineage-specific regulatory programs.

Download Full-text

Sp8 regulatory function in the limb bud ectoderm

10.1101/2020.02.26.965178 ◽

2020 ◽

Author(s):

Rocío Pérez-Gómez ◽

Marc Fernández-Guerrero ◽

Víctor Campa ◽

Juan F. Lopez-Gimenez ◽

Alvaro Rada-Iglesias ◽

...

Keyword(s):

Target Genes ◽

Mouse Genetics ◽

Limb Bud ◽

Regulatory Function ◽

Dependent Manner ◽

Consensus Sequences ◽

Genome Wide ◽

Regulatory Functions ◽

Dual Mechanism ◽

Dose Dependent

ABSTRACTSp8 and Sp6 are two closely related Sp genes expressed in the limb ectoderm where they regulate proximo-distal and dorso-ventral patterning. Mouse genetics revealed that they act together in a dose-dependent manner but with Sp8 making a much greater contribution. Here, we combine ChIP-seq and RNA-seq genome-wide analyses to investigate the Sp8 regulatory network and mechanism of action. We find that Sp8 predominantly binds to putative distal enhancers to activate crucial limb patterning genes, including Fgf8, En1, Sp6 and Rspo2. Sp8 exerts its regulatory function by directly binding DNA at Sp consensus sequences or indirectly through Dlx5 interaction. Overall, our work underscores Sp8 master regulatory functions and supports a model in which it cooperates with other Dlx and Sp cofactors to regulate target genes. We believe that this model could help to properly understand the molecular basis of congenital malformations.Impact SentenceIn the limb ectoderm, Sp8 regulates master genes through a dual mechanism: directly binding DNA at Sp consensus sequences and indirectly engaging through Dlx5 interaction.

Download Full-text

The Scleraxis Transcription Factor Directly Regulates Multiple Distinct Molecular and Cellular Processes During Early Tendon Cell Differentiation

Frontiers in Cell and Developmental Biology ◽

10.3389/fcell.2021.654397 ◽

2021 ◽

Vol 9 ◽

Author(s):

Han Liu ◽

Jingyue Xu ◽

Yu Lan ◽

Hee-Woong Lim ◽

Rulang Jiang

Keyword(s):

Transcription Factor ◽

Cell Differentiation ◽

Molecular Mechanisms ◽

Target Genes ◽

Rna Seq ◽

Tendon Cell ◽

Tendon Development ◽

Genome Wide ◽

Direct Target ◽

Embryonic Tendon

Proper development of tendons is crucial for the integration and function of the musculoskeletal system. Currently little is known about the molecular mechanisms controlling tendon development and tendon cell differentiation. The transcription factor Scleraxis (Scx) is expressed throughout tendon development and plays essential roles in both embryonic tendon development and adult tendon healing, but few direct target genes of Scx in tendon development have been reported and genome-wide identification of Scx direct target genes in vivo has been lacking. In this study, we have generated a ScxFlag knockin mouse strain, which produces fully functional endogenous Scx proteins containing a 2xFLAG epitope tag at the carboxy terminus. We mapped the genome-wide Scx binding sites in the developing limb tendon tissues, identifying 12,097 high quality Scx regulatory cis-elements in-around 7,520 genes. Comparative analysis with previously reported embryonic tendon cell RNA-seq data identified 490 candidate Scx direct target genes in early tendon development. Furthermore, we characterized a new Scx gene-knockout mouse line and performed whole transcriptome RNA sequencing analysis of E15.5 forelimb tendon cells from Scx–/– embryos and control littermates, identifying 68 genes whose expression in the developing tendon tissues significantly depended on Scx function. Combined analysis of the ChIP-seq and RNA-seq data yielded 32 direct target genes that required Scx for activation and an additional 17 target genes whose expression was suppressed by Scx during early tendon development. We further analyzed and validated Scx-dependent tendon-specific expression patterns of a subset of the target genes, including Fmod, Kera, Htra3, Ssc5d, Tnmd, and Zfp185, by in situ hybridization and real-time quantitative polymerase chain reaction assays. These results provide novel insights into the molecular mechanisms mediating Scx function in tendon development and homeostasis. The ChIP-seq and RNA-seq data provide a rich resource for aiding design of further studies of the mechanisms regulating tendon cell differentiation and tendon tissue regeneration. The ScxFlag mice provide a valuable new tool for unraveling the molecular mechanisms involving Scx in the protein interaction and gene-regulatory networks underlying many developmental and disease processes.

Download Full-text

EPIGENE: genome wide transcription unit annotation using a multivariate probabilistic model of histone modifications

10.1101/2019.12.17.878454 ◽

2019 ◽

Author(s):

Anshupa Sahu ◽

Na Li ◽

Ilona Dunkel ◽

Ho-Ryun Chung

Keyword(s):

Rna Polymerase ◽

Rna Polymerase Ii ◽

Histone Modifications ◽

Transcription Unit ◽

Rna Seq ◽

Segmentation Method ◽

Bernoulli Random Variables ◽

Gene Annotations ◽

Genome Wide ◽

Genomic Regions

AbstractBackgroundUnderstanding transcriptome is critical for explaining functional as well as regulatory roles of genomic regions. Current methods for the identification of transcription unit (TU) uses RNA-seq which, however, requires large quantities of mRNA limiting the identification of inherently unstable TUs e.g. for miRNA precursors. This problem can be resolved by chromatin based approaches due to a correlation between histone modifications and transcription.ResultsHere we introduce EPIGENE, a novel chromatin segmentation method for the identification of active TUs using transcription associated histone modifications. Unlike existing chromatin segmentation approaches, EPIGENE uses a constrained, semi-supervised multivariate hidden markov model (HMM) that models the observed combination of histone modifications using a product of independent Bernoulli random variables, to identify active TUs. Our results show that EPIGENE can identify genome-wide TUs unbiasedly. EPIGENE predicted TUs showed an enrichment of RNA Polymerase II in transcription start site and gene body indicating that they have been transcribed. Comprehensive validation with existing annotations revealed that 93% of EPIGENE TUs can be explained by existing gene annotations and 5% of EPIGENE TUs in HepG2 can be explained by microRNA annotations. EPIGENE outperforms existing RNA-Seq based approaches in TU prediction precision across human cell lines. Finally, we identify 381 novel TUs in K562 and 43 novel cell-specific TUs all of which are supported by RNA Polymerase II data.ConclusionsWe demonstrate the applicability of HMM to identify genome-wide active TUs and provides valuable information about unannotated TUs. EPIGENE is an open-source method and is freely available at: https://github.com/imbeLab/EPIGENE.

Download Full-text