Inference of the Human Polyadenylation Code

Mapping Intimacies ◽

10.1101/130591 ◽

2017 ◽

Cited By ~ 1

Author(s):

Michael K. K. Leung ◽

Andrew Delong ◽

Brendan J. Frey

Keyword(s):

Neural Network ◽

Site Selection ◽

Rna Binding ◽

Genomic Sequence ◽

Nucleosome Occupancy ◽

Regulatory Elements ◽

Polyadenylation Site ◽

Tissue Specific ◽

Protein Motifs ◽

Polyadenylation Sites

AbstractProcessing of transcripts at the 3’-end involves cleavage at a polyadenylation site followed by the addition of a poly(A)-tail. By selecting which polyadenylation site is cleaved, alternative polyadenylation enables genes to produce transcript isoforms with different 3’-ends. To facilitate the identification and treatment of disease-causing mutations that affect polyadenylation and to understand the underlying regulatory processes, a computational model that can accurately predict polyadenylation patterns based on genomic features is desirable. Previous works have focused on identifying candidate polyadenylation sites and classifying sites which may be tissue-specific. What is lacking is a predictive model of the underlying mechanism of site selection, competition, and processing efficiency in a tissue-specific manner. We develop a deep learning model that trains on 3’-end sequencing data and predicts tissue-specific site selection among competing polyadenylation sites in the 3’ untranslated region of the human genome.Two neural network architectures are evaluated: one built on hand-engineered features, and another that directly learns from the genomic sequence. The hand-engineered features include polyadenylation signals, cis-regulatory elements, n-mer counts, nucleosome occupancy, and RNA-binding protein motifs. The direct-from-sequence model is inferred without prior knowledge on polyadenylation, based on a convolutional neural network trained with genomic sequences surrounding each polyadenylation site as input. Both models are trained using the TensorFlow library.The proposed polyadenylation code can predict site selection among competing polyadenylation sites in different tissues. Importantly, it does so without relying on evolutionary conservation. The model can distinguish pathogenic from benign variants that appear near annotated polyadenylation sites in ClinVar and inspect the genome to find candidate polyadenylation sites. We also provide an analysis on how different features affect the model’s performance.

Download Full-text

Benchmarking sequencing methods and tools that facilitate the study of alternative polyadenylation

Genome Biology ◽

10.1186/s13059-021-02502-z ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Ankeeta Shah ◽

Briana E. Mittleman ◽

Yoav Gilad ◽

Yang I. Li

Keyword(s):

Single Molecule ◽

Alternative Polyadenylation ◽

Regulatory Elements ◽

Polyadenylation Site ◽

Rna Seq ◽

Computational Tools ◽

Protein Coding ◽

Short Read ◽

Processing Event ◽

Polyadenylation Sites

Abstract Background Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3′ ends. Most APA occurs within 3′ UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization. Results APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard, short-read RNA-seq datasets. Here, we benchmarked a number of such tools—TAPAS, QAPA, DaPars2, GETUTR, and APATrap— against 3′-Seq, a specialized RNA-seq protocol that enriches for reads at the 3′ ends of genes, and Iso-Seq, a Pacific Biosciences (PacBio) single-molecule full-length RNA-seq method in their ability to identify polyadenylation sites and quantify polyadenylation site usage. We demonstrate that 3′-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that take short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3′-Seq or Iso-Seq can reliably quantify variation in APA across conditions, such asacross genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL). Conclusions We envisage that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3′-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of polyadenylation sites and quantification of polyadenylation site usages using Iso-Seq data as input.

Download Full-text

RNA m6A modification orchestrates a LINE-1–host interaction that facilitates retrotransposition and contributes to long gene vulnerability

Cell Research ◽

10.1038/s41422-021-00515-8 ◽

2021 ◽

Cited By ~ 1

Author(s):

Feng Xiong ◽

Ruoyu Wang ◽

Joo-Hyung Lee ◽

Shenglan Li ◽

Shin-Fu Chen ◽

...

Keyword(s):

Matrix Protein ◽

Rna Binding ◽

Rna Binding Proteins ◽

Regulatory Elements ◽

Host Interaction ◽

Damage Repair ◽

Novel Host ◽

Cell Type Specific ◽

Human Fetal Tissues ◽

Long Genes

AbstractThe molecular basis underlying the interaction between retrotransposable elements (RTEs) and the human genome remains poorly understood. Here, we profiled N6-methyladenosine (m6A) deposition on nascent RNAs in human cells by developing a new method MINT-Seq, which revealed that many classes of RTE RNAs, particularly intronic LINE-1s (L1s), are strongly methylated. These m6A-marked intronic L1s (MILs) are evolutionarily young, sense-oriented to hosting genes, and are bound by a dozen RNA binding proteins (RBPs) that are putative novel readers of m6A-modified RNAs, including a nuclear matrix protein SAFB. Notably, m6A positively controls the expression of both autonomous L1s and co-transcribed L1 relics, promoting L1 retrotransposition. We showed that MILs preferentially reside in long genes with critical roles in DNA damage repair and sometimes in L1 suppression per se, where they act as transcriptional “roadblocks” to impede the hosting gene expression, revealing a novel host-weakening strategy by the L1s. In counteraction, the host uses the SAFB reader complex to bind m6A-L1s to reduce their levels, and to safeguard hosting gene transcription. Remarkably, our analysis identified thousands of MILs in multiple human fetal tissues, enlisting them as a novel category of cell-type-specific regulatory elements that often compromise transcription of long genes and confer their vulnerability in neurodevelopmental disorders. We propose that this m6A-orchestrated L1–host interaction plays widespread roles in gene regulation, genome integrity, human development and diseases.

Download Full-text

Tissue‐specific regulatory elements in mammalian promoters

Molecular Systems Biology ◽

10.1038/msb4100114 ◽

2007 ◽

Vol 3 (1) ◽

pp. 73 ◽

Cited By ~ 34

Author(s):

Andrew D Smith ◽

Pavel Sumazin ◽

Michael Q Zhang

Keyword(s):

Regulatory Elements ◽

Tissue Specific

Download Full-text

Calcitonin/calcitonin gene-related peptide transcription unit: tissue-specific expression involves selective use of alternative polyadenylation sites

Molecular and Cellular Biology ◽

10.1128/mcb.4.10.2151-2160.1984 ◽

1984 ◽

Vol 4 (10) ◽

pp. 2151-2160

Author(s):

S G Amara ◽

R M Evans ◽

M G Rosenfeld

Keyword(s):

Regulation Of Gene Expression ◽

Alternative Polyadenylation ◽

Calcitonin Gene Related Peptide ◽

Transcription Unit ◽

Specific Expression ◽

Tissue Specific ◽

Related Peptide ◽

Calcitonin Gene ◽

Polyadenylation Sites ◽

Specific Regulation

Different 3' coding exons in the rat calcitonin gene are used to generate distinct mRNAs encoding either the hormone calcitonin in thyroidal C-cells or a new neuropeptide referred to as calcitonin gene-related peptide in neuronal tissue, indicating the RNA processing regulation is one strategy used in tissue-specific regulation of gene expression in the brain. Although the two mRNAs use the same transcriptional initiation site and have identical 5' terminal sequences, their 3' termini are distinct. The polyadenylation sites for calcitonin and calcitonin gene-related peptide mRNAs are located at the end of the exons 4 and 6, respectively. Termination of transcription after the calcitonin exon does not dictate the production of calcitonin mRNA, because transcription proceeds through both calcitonin and calcitonin gene-related peptide exons irrespective of which mRNA is ultimately produced. In isolated nuclei, both polyadenylation sites appear to be utilized; however, the proximal (calcitonin) site is preferentially used in nuclei from tissues producing calcitonin mRNA. These data suggest that the mechanism dictating production of each mRNA involves the selective use of alternative polyadenylation sites.

Download Full-text

294. De Novo Design of Tissue-Specific Regulatory Elements Results in Robust Transduction in Heart and Liver: Implications for Cardiovascular Disease and Hemophilia

Molecular Therapy ◽

10.1016/s1525-0016(16)36098-1 ◽

2012 ◽

Vol 20 ◽

pp. S116

Keyword(s):

Cardiovascular Disease ◽

De Novo ◽

Regulatory Elements ◽

De Novo Design ◽

Tissue Specific

Download Full-text

Diverse Genetic Regulatory Elements are Required to Direct the Proper Tissue-Specific and Developmental Expression of the Murine Adenosine Deaminase Gene

Purine and Pyrimidine Metabolism in Man VIII - Advances in Experimental Medicine and Biology ◽

10.1007/978-1-4615-2584-4_121 ◽

1995 ◽

pp. 579-584 ◽

Cited By ~ 1

Author(s):

John H. Winston ◽

Lyhna Hong ◽

Simon Akroyd ◽

Gerri Hanten ◽

Katrina Waymire ◽

...

Keyword(s):

Adenosine Deaminase ◽

Regulatory Elements ◽

Developmental Expression ◽

Tissue Specific

Download Full-text

Transcriptional regulation of metabolism in disease: From transcription factors to epigenetics

PeerJ ◽

10.7717/peerj.5062 ◽

2018 ◽

Vol 6 ◽

pp. e5062 ◽

Cited By ~ 4

Author(s):

Liam J. Hawkins ◽

Rasha Al-attar ◽

Kenneth B. Storey

Keyword(s):

Transcriptional Regulation ◽

Genomic Sequence ◽

Regulatory Elements ◽

Regulatory Pathways ◽

Transcriptional Dysregulation ◽

Disease States ◽

Specific Subset ◽

Transcriptional Regulatory ◽

Cis And Trans ◽

Or Genes

Every cell in an individual has largely the same genomic sequence and yet cells in different tissues can present widely different phenotypes. This variation arises because each cell expresses a specific subset of genomic instructions. Control over which instructions, or genes, are expressed is largely controlled by transcriptional regulatory pathways. Each cell must assimilate a huge amount of environmental input, and thus it is of no surprise that transcription is regulated by many intertwining mechanisms. This large regulatory landscape means there are ample possibilities for problems to arise, which in a medical context means the development of disease states. Metabolism within the cell, and more broadly, affects and is affected by transcriptional regulation. Metabolism can therefore contribute to improper transcriptional programming, or pathogenic metabolism can be the result of transcriptional dysregulation. Here, we discuss the established and emerging mechanisms for controling transcription and how they affect metabolism in the context of pathogenesis. Cis- and trans-regulatory elements, microRNA and epigenetic mechanisms such as DNA and histone methylation, all have input into what genes are transcribed. Each has also been implicated in diseases such as metabolic syndrome, various forms of diabetes, and cancer. In this review, we discuss the current understanding of these areas and highlight some natural models that may inspire future therapeutics.

Download Full-text

Massively parallel identification of zipcodes in primary cortical neurons

10.1101/2021.10.21.465275 ◽

2021 ◽

Author(s):

Nicolai von Kuegelgen ◽

Samantha Mendonsa ◽

Sayaka Dantsuji ◽

Maya Ron ◽

Marieluise Kirchner ◽

...

Keyword(s):

Cortical Neurons ◽

Rna Binding ◽

De Novo ◽

Rna Binding Proteins ◽

Regulatory Elements ◽

Massively Parallel ◽

Primary Cortical Neurons ◽

Subcellular Compartments ◽

Local Functions ◽

Massively Parallel Reporter Assay

Cells adopt highly polarized shapes and form distinct subcellular compartments largely due to the localization of many mRNAs to specific areas, where they are translated into proteins with local functions. This mRNA localization is mediated by specific cis-regulatory elements in mRNAs, commonly called "zipcodes." Their recognition by RNA-binding proteins (RBPs) leads to the integration of the mRNAs into macromolecular complexes and their localization. While there are hundreds of localized mRNAs, only a few zipcodes have been characterized. Here, we describe a novel neuronal zipcode identification protocol (N-zip) that can identify zipcodes across hundreds of 3'UTRs. This approach combines a method of separating the principal subcellular compartments of neurons - cell bodies and neurites - with a massively parallel reporter assay. Our analysis identifies the let-7 binding site and (AU)n motif as de novo zipcodes in mouse primary cortical neurons and suggests a strategy for detecting many more.

Download Full-text

Haplotype-aware single-cell multiomics uncovers functional effects of somatic structural variation

10.1101/2021.11.11.468039 ◽

2021 ◽

Author(s):

Hyobin Jeong ◽

Karen Grimes ◽

Peter-Martin Bruch ◽

Tobias Rausch ◽

Patrick Hasenfeld ◽

...

Keyword(s):

Single Cell ◽

Nucleosome Occupancy ◽

Single Cells ◽

Chromosomal Rearrangements ◽

Regulatory Elements ◽

Computational Method ◽

Tumour Heterogeneity ◽

Cancer Genomes ◽

Functional Consequences ◽

Oncogenic Transcription Factor

Somatic structural variants (SVs) are widespread in cancer genomes, however, their impact on tumorigenesis and intra-tumour heterogeneity is incompletely understood, since methods to functionally characterize the broad spectrum of SVs arising in cancerous single-cells are lacking. We present a computational method, scNOVA, that couples SV discovery with nucleosome occupancy analysis by haplotype-resolved single-cell sequencing, to systematically uncover SV effects on cis-regulatory elements and gene activity. Application to leukemias and cell lines uncovered SV outcomes at several loci, including dysregulated cancer-related pathways and mono-allelic oncogene expression near SV breakpoints. At the intra-patient level, we identified different yet overlapping subclonal SVs that converge on aberrant Wnt signaling. We also deconvoluted the effects of catastrophic chromosomal rearrangements resulting in oncogenic transcription factor dysregulation. scNOVA directly links SVs to their functional consequences, opening the door for single-cell multiomics of SVs in heterogeneous cell populations.

Download Full-text

Functional characterization of splicing regulatory elements

10.1101/2021.05.14.444228 ◽

2021 ◽

Author(s):

Scott I Adamson ◽

Lijun Zhan ◽

Brenton R Graveley

Keyword(s):

High Throughput ◽

Binding Proteins ◽

Rna Binding ◽

Rna Binding Proteins ◽

Functional Characterization ◽

Regulatory Elements ◽

Small Subset ◽

General Sequence ◽

Splicing Regulators ◽

Splicing Regulatory Elements

Background: RNA binding protein-RNA interactions mediate a variety of processes including pre-mRNA splicing, translation, decay, polyadenylation and many others. Previous high-throughput studies have characterized general sequence features associated with increased and decreased splicing of certain exons, but these studies are limited by not knowing the mechanisms, and in particular, the mediating RNA binding proteins, underlying these associations. Results: Here we utilize ENCODE data from diverse data modalities to identify functional splicing regulatory elements and their associated RNA binding proteins. We identify features which make splicing events more sensitive to depletion of RNA binding proteins, as well as which RNA binding proteins act as splicing regulators sensitive to depletion. To analyze the sequence determinants underlying RBP-RNA interactions impacting splicing, we assay tens of thousands of sequence variants in a high-throughput splicing reporter called Vex-seq and confirm a small subset in their endogenous loci using CRISPR base editors. Finally, we leverage other large transcriptomic datasets to confirm the importance of RNA binding proteins which we designed experiments around and identify additional RBPs which may act as additional splicing regulators of the exons studied. Conclusions: This study identifies sequence and other features underlying splicing regulation mediated specific RNA binding proteins, as well as validates and identifies other potentially important regulators of splicing in other large transcriptomic datasets.

Download Full-text