scholarly journals Inference of the Human Polyadenylation Code

2017 ◽  
Author(s):  
Michael K. K. Leung ◽  
Andrew Delong ◽  
Brendan J. Frey

AbstractProcessing of transcripts at the 3’-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which polyadenylation site is cleaved, alternative polyadenylation enables genes to produce transcript isoforms with different 3’-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the underlying regulatory processes, a computational model that can accurately predict polyadenylation patterns based on genomic features is desirable. Previous works have focused on identifying candidate polyadenylation sites and classifying sites which may be tissue-specific. What is lacking is a predictive model of the underlying mechanism of site selection, competition, and processing efficiency in a tissue-specific manner. We develop a deep learning model that trains on 3’-end sequencing data and predicts tissue-specific site selection among competing polyadenylation sites in the 3’ untranslated region of the human genome.Two neural network architectures are evaluated: one built on hand-engineered features, and another that directly learns from the genomic sequence. The hand-engineered features include polyadenylation signals, cis-regulatory elements, n-mer counts, nucleosome occupancy, and RNA-binding protein motifs. The direct-from-sequence model is inferred without prior knowledge on polyadenylation, based on a convolutional neural network trained with genomic sequences surrounding each polyadenylation site as input. Both models are trained using the TensorFlow library.The proposed polyadenylation code can predict site selection among competing polyadenylation sites in different tissues. Importantly, it does so without relying on evolutionary conservation. The model can distinguish pathogenic from benign variants that appear near annotated polyadenylation sites in ClinVar and inspect the genome to find candidate polyadenylation sites. We also provide an analysis on how different features affect the model’s performance.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ankeeta Shah ◽  
Briana E. Mittleman ◽  
Yoav Gilad ◽  
Yang I. Li

Abstract Background Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3′ ends. Most APA occurs within 3′ UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization. Results APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard, short-read RNA-seq datasets. Here, we benchmarked a number of such tools—TAPAS, QAPA, DaPars2, GETUTR, and APATrap— against 3′-Seq, a specialized RNA-seq protocol that enriches for reads at the 3′ ends of genes, and Iso-Seq, a Pacific Biosciences (PacBio) single-molecule full-length RNA-seq method in their ability to identify polyadenylation sites and quantify polyadenylation site usage. We demonstrate that 3′-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that take short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3′-Seq or Iso-Seq can reliably quantify variation in APA across conditions, such asacross genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL). Conclusions We envisage that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3′-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of polyadenylation sites and quantification of polyadenylation site usages using Iso-Seq data as input.


Author(s):  
Feng Xiong ◽  
Ruoyu Wang ◽  
Joo-Hyung Lee ◽  
Shenglan Li ◽  
Shin-Fu Chen ◽  
...  

AbstractThe molecular basis underlying the interaction between retrotransposable elements (RTEs) and the human genome remains poorly understood. Here, we profiled N6-methyladenosine (m6A) deposition on nascent RNAs in human cells by developing a new method MINT-Seq, which revealed that many classes of RTE RNAs, particularly intronic LINE-1s (L1s), are strongly methylated. These m6A-marked intronic L1s (MILs) are evolutionarily young, sense-oriented to hosting genes, and are bound by a dozen RNA binding proteins (RBPs) that are putative novel readers of m6A-modified RNAs, including a nuclear matrix protein SAFB. Notably, m6A positively controls the expression of both autonomous L1s and co-transcribed L1 relics, promoting L1 retrotransposition. We showed that MILs preferentially reside in long genes with critical roles in DNA damage repair and sometimes in L1 suppression per se, where they act as transcriptional “roadblocks” to impede the hosting gene expression, revealing a novel host-weakening strategy by the L1s. In counteraction, the host uses the SAFB reader complex to bind m6A-L1s to reduce their levels, and to safeguard hosting gene transcription. Remarkably, our analysis identified thousands of MILs in multiple human fetal tissues, enlisting them as a novel category of cell-type-specific regulatory elements that often compromise transcription of long genes and confer their vulnerability in neurodevelopmental disorders. We propose that this m6A-orchestrated L1–host interaction plays widespread roles in gene regulation, genome integrity, human development and diseases.


2007 ◽  
Vol 3 (1) ◽  
pp. 73 ◽  
Author(s):  
Andrew D Smith ◽  
Pavel Sumazin ◽  
Michael Q Zhang

1984 ◽  
Vol 4 (10) ◽  
pp. 2151-2160
Author(s):  
S G Amara ◽  
R M Evans ◽  
M G Rosenfeld

Different 3' coding exons in the rat calcitonin gene are used to generate distinct mRNAs encoding either the hormone calcitonin in thyroidal C-cells or a new neuropeptide referred to as calcitonin gene-related peptide in neuronal tissue, indicating the RNA processing regulation is one strategy used in tissue-specific regulation of gene expression in the brain. Although the two mRNAs use the same transcriptional initiation site and have identical 5' terminal sequences, their 3' termini are distinct. The polyadenylation sites for calcitonin and calcitonin gene-related peptide mRNAs are located at the end of the exons 4 and 6, respectively. Termination of transcription after the calcitonin exon does not dictate the production of calcitonin mRNA, because transcription proceeds through both calcitonin and calcitonin gene-related peptide exons irrespective of which mRNA is ultimately produced. In isolated nuclei, both polyadenylation sites appear to be utilized; however, the proximal (calcitonin) site is preferentially used in nuclei from tissues producing calcitonin mRNA. These data suggest that the mechanism dictating production of each mRNA involves the selective use of alternative polyadenylation sites.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e5062 ◽  
Author(s):  
Liam J. Hawkins ◽  
Rasha Al-attar ◽  
Kenneth B. Storey

Every cell in an individual has largely the same genomic sequence and yet cells in different tissues can present widely different phenotypes. This variation arises because each cell expresses a specific subset of genomic instructions. Control over which instructions, or genes, are expressed is largely controlled by transcriptional regulatory pathways. Each cell must assimilate a huge amount of environmental input, and thus it is of no surprise that transcription is regulated by many intertwining mechanisms. This large regulatory landscape means there are ample possibilities for problems to arise, which in a medical context means the development of disease states. Metabolism within the cell, and more broadly, affects and is affected by transcriptional regulation. Metabolism can therefore contribute to improper transcriptional programming, or pathogenic metabolism can be the result of transcriptional dysregulation. Here, we discuss the established and emerging mechanisms for controling transcription and how they affect metabolism in the context of pathogenesis. Cis- and trans-regulatory elements, microRNA and epigenetic mechanisms such as DNA and histone methylation, all have input into what genes are transcribed. Each has also been implicated in diseases such as metabolic syndrome, various forms of diabetes, and cancer. In this review, we discuss the current understanding of these areas and highlight some natural models that may inspire future therapeutics.


2021 ◽  
Author(s):  
Nicolai von Kuegelgen ◽  
Samantha Mendonsa ◽  
Sayaka Dantsuji ◽  
Maya Ron ◽  
Marieluise Kirchner ◽  
...  

Cells adopt highly polarized shapes and form distinct subcellular compartments largely due to the localization of many mRNAs to specific areas, where they are translated into proteins with local functions. This mRNA localization is mediated by specific cis-regulatory elements in mRNAs, commonly called "zipcodes." Their recognition by RNA-binding proteins (RBPs) leads to the integration of the mRNAs into macromolecular complexes and their localization. While there are hundreds of localized mRNAs, only a few zipcodes have been characterized. Here, we describe a novel neuronal zipcode identification protocol (N-zip) that can identify zipcodes across hundreds of 3'UTRs. This approach combines a method of separating the principal subcellular compartments of neurons - cell bodies and neurites - with a massively parallel reporter assay. Our analysis identifies the let-7 binding site and (AU)n motif as de novo zipcodes in mouse primary cortical neurons and suggests a strategy for detecting many more.


2021 ◽  
Author(s):  
Hyobin Jeong ◽  
Karen Grimes ◽  
Peter-Martin Bruch ◽  
Tobias Rausch ◽  
Patrick Hasenfeld ◽  
...  

Somatic structural variants (SVs) are widespread in cancer genomes, however, their impact on tumorigenesis and intra-tumour heterogeneity is incompletely understood, since methods to functionally characterize the broad spectrum of SVs arising in cancerous single-cells are lacking. We present a computational method, scNOVA, that couples SV discovery with nucleosome occupancy analysis by haplotype-resolved single-cell sequencing, to systematically uncover SV effects on cis-regulatory elements and gene activity. Application to leukemias and cell lines uncovered SV outcomes at several loci, including dysregulated cancer-related pathways and mono-allelic oncogene expression near SV breakpoints. At the intra-patient level, we identified different yet overlapping subclonal SVs that converge on aberrant Wnt signaling. We also deconvoluted the effects of catastrophic chromosomal rearrangements resulting in oncogenic transcription factor dysregulation. scNOVA directly links SVs to their functional consequences, opening the door for single-cell multiomics of SVs in heterogeneous cell populations.


2021 ◽  
Author(s):  
Scott I Adamson ◽  
Lijun Zhan ◽  
Brenton R Graveley

Background: RNA binding protein-RNA interactions mediate a variety of processes including pre-mRNA splicing, translation, decay, polyadenylation and many others. Previous high-throughput studies have characterized general sequence features associated with increased and decreased splicing of certain exons, but these studies are limited by not knowing the mechanisms, and in particular, the mediating RNA binding proteins, underlying these associations. Results: Here we utilize ENCODE data from diverse data modalities to identify functional splicing regulatory elements and their associated RNA binding proteins. We identify features which make splicing events more sensitive to depletion of RNA binding proteins, as well as which RNA binding proteins act as splicing regulators sensitive to depletion. To analyze the sequence determinants underlying RBP-RNA interactions impacting splicing, we assay tens of thousands of sequence variants in a high-throughput splicing reporter called Vex-seq and confirm a small subset in their endogenous loci using CRISPR base editors. Finally, we leverage other large transcriptomic datasets to confirm the importance of RNA binding proteins which we designed experiments around and identify additional RBPs which may act as additional splicing regulators of the exons studied. Conclusions: This study identifies sequence and other features underlying splicing regulation mediated specific RNA binding proteins, as well as validates and identifies other potentially important regulators of splicing in other large transcriptomic datasets.


Sign in / Sign up

Export Citation Format

Share Document