Integrative analysis reveals RNA G-Quadruplexes in UTRs are selectively constrained and enriched for functional associations

Mapping Intimacies ◽

10.1101/666842 ◽

2019 ◽

Cited By ~ 2

Author(s):

David S.M. Lee ◽

Louis R. Ghanem ◽

Yoseph Barash

Keyword(s):

Regulatory Networks ◽

Rna Binding ◽

Purifying Selection ◽

Regulatory Elements ◽

Messenger Rnas ◽

Multiple Sources ◽

Protein Coding ◽

G Quadruplex ◽

Preferential Binding ◽

Missense Variation

ABSTRACTIdentifying regulatory elements in the noncoding genome is a fundamental challenge in biology. G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we uncover a degree of negative (purifying) selection in UTR pG4s comparable to that of missense variation in protein-coding sequences. In parallel, we identify new proteins with evidence for preferential binding at pG4s from ENCODE annotations, and delineate putative regulatory networks composed of shared binding targets. Finally, by mapping variants in the NIH GWAS Catalogue and ClinVar, we find enrichment for disease-associated variation in 3’UTR pG4s. At a GWAS pG4-variant associated with hypertension in HSPB7, we uncover robust allelic imbalance in GTEx RNA-seq across multiple tissues, suggesting that changes in gene expression associated with pG4 disruption underlie the observed phenotypic association. Taken together, our results establish UTR G-quadruplexes as important cis-regulatory features, and point to a putative link between disruption within UTR pG4 and susceptibility to human disease.

Download Full-text

Developmental constraints on genome evolution in four bilaterian model species

10.1101/161679 ◽

2017 ◽

Author(s):

Jialin Liu ◽

Marc Robinson-Rechavi

Keyword(s):

Genome Evolution ◽

Purifying Selection ◽

Regulatory Elements ◽

Sequence Evolution ◽

Late Development ◽

Developmental Constraints ◽

Protein Coding ◽

New Genes ◽

Hourglass Model ◽

Conservation Model

AbstractDevelopmental constraints on genome evolution have been suggested to follow either an early conservation model or an “hourglass” model. Both models agree that late development strongly diverges between species, but debate on which developmental period is the most conserved. Here, based on a modified “Transcriptome Age Index” approach, i.e. weighting trait measures by expression level, we analyzed the constraints acting on three evolutionary traits of protein coding genes (strength of purifying selection on protein sequences, phyletic age, and duplicability) in four species: nematode worm Caenorhabditis elegans, fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. In general, we found that both models can be supported by different genomic properties. Sequence evolution follows an hourglass model, but the evolution of phyletic age and of duplicability follow an early conservation model. Further analyses indicate that stronger purifying selection on sequences in the middle development are driven by temporal pleiotropy of these genes. In addition, we report evidence that expression in late development is enriched with retrogenes, which usually lack efficient regulatory elements. This implies that expression in late development could facilitate transcription of new genes, and provide opportunities for acquisition of function. Finally, in C. elegans, we suggest that dosage imbalance could be one of the main factors that cause depleted expression of high duplicability genes in early development.

Download Full-text

ENDOGENOUS RETROVIRUSES AS GENETIC MODULES THAT SHAPE THE GENOME REGULATORY NETWORKS DURING EVOLUTION

The Journal of V. N. Karazin Kharkiv National University, Series "Medicine" ◽

10.26565/2313-6693-2018-36-12 ◽

2018 ◽

Keyword(s):

Regulatory Networks ◽

Gene Activation ◽

Cell Types ◽

Regulatory Elements ◽

Endogenous Retroviruses ◽

Protein Coding ◽

Transcriptional Enhancers ◽

Wide Range ◽

Gene Regulatory Elements ◽

Gene Regulatory

Endogenous retroviruses (ERV) are the descendants of exogenous retroviruses that integrated into the germ cells genome, fixed and became inheritable. ERVs have evolved transcriptional enhancers and promoters that allow their replication in a wide range of tissue. Because ERVs comprise the regulatory elements it could be assume that ERVs capable to shape and reshape genomic regulatory networks by inserting their promoters and enhancers in new genomic loci upon retrotransposition. Thus retroransposition events can build new regulatory regions and lead to a new pattern of gene activation in the cell. In this review we summarize evidence which revealed that ERVs provide a plethora of novel gene regulatory elements, including tissue specific promoters and enhancers for protein-coding genes or long noncoding RNAs in a wide range of cell types. The accumulated findings support the hypothesis that the ERVs have rewired the gene regulatory networks and act as a major source of genomic regulatory innovation during evolution.

Download Full-text

MTR-Viewer: identifying regions within genes under purifying selection

Nucleic Acids Research ◽

10.1093/nar/gkz457 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W121-W126 ◽

Cited By ~ 12

Author(s):

Michael Silk ◽

Slavé Petrovski ◽

David B Ascher

Keyword(s):

Selective Pressure ◽

Web Server ◽

Purifying Selection ◽

Genome Sequences ◽

Protein Coding ◽

Sliding Windows ◽

Alternative Transcripts ◽

Missense Variation ◽

User Friendly ◽

Tolerance Ratio

Abstract Advances in genomic sequencing have enormous potential to revolutionize personalized medicine, however distinguishing disease-causing from benign variants remains a challenge. The increasing number of human genome and exome sequences available has revealed areas where unfavourable variation is removed through purifying selection. Here, we present the MTR-Viewer, a web-server enabling easy visualization at the gene or variant level of the Missense Tolerance Ratio (MTR), a measure of regional intolerance to missense variation calculated using variation from 240 000 exome and genome sequences. The MTR-Viewer enables exploration of MTR calculations, using different sliding windows, for over 18 000 human protein-coding genes and 85 000 alternative transcripts. Users can also view MTR scores calculated for specific ethnicities, to enable easy exploration of regions that may be under different selective pressure. The spatial distribution of population and known disease variants is also displayed on the protein's domain structure. Intolerant regions were found to be highly enriched for ClinVar pathogenic and COSMIC somatic missense variants (Mann–Whitney U test P < 2.2 × 10−16). As the MTR is not biased by known domains and protein features, it can highlight functionally important regions within genes overlooked or inaccessible by traditional methods. MTR-Viewer is freely available via a user friendly web-server at http://biosig.unimelb.edu.au/mtr-viewer/.

Download Full-text

The snoGloBe interaction predictor enables a broader study of box C/D snoRNA functions and mechanisms

10.1101/2021.09.14.460265 ◽

2021 ◽

Author(s):

Gabrielle Deschamps-Francoeur ◽

Sonia Couture ◽

Sherif Abou Elela ◽

Michelle S Scott

Keyword(s):

Noncoding Rna ◽

Rna Binding ◽

Rna Binding Proteins ◽

Regulatory Elements ◽

Gradient Boosting ◽

Target Sequence ◽

Messenger Rnas ◽

Small Nuclear Rna ◽

Transcript Stability ◽

Rna Interaction

Box C/D small nucleolar RNAs (snoRNAs) are a conserved class of noncoding RNA known to serve as guides for the site-specific 2'-O-ribose methylation of ribosomal RNAs and the U6 small nuclear RNA, through direct base pairing with the target. In recent years however, several examples of box C/D snoRNAs regulating different levels of gene expression including transcript stability and splicing have been reported. These regulatory interactions typically require direct binding of the target but do not always involve the guide region. Supporting these new box C/D snoRNA functions, high-throughput RNA-RNA interaction datasets detect many interactions between box C/D snoRNAs and messenger RNAs. To facilitate the study of box C/D snoRNA functionality, we created snoGloBe, a box C/D snoRNA machine learning target predictor based on a gradient boosting classifier and considering snoRNA and target sequence and position as well as target type. SnoGloBe convincingly outperforms general RNA duplex predictors and PLEXY, the only box C/D snoRNA-specific target predictor available. The study of snoGloBe human transcriptome-wide predictions identifies enrichment in snoRNA interactions in exons and on exon-intron junctions. Some specific snoRNAs are predicted to target groups of functionally-related transcripts on common regulatory elements and the exact position of the predicted targets strongly overlaps binding sites of RNA-binding proteins involved in relevant molecular functions. SnoGloBe was also applied to predicting interactions between human box C/D snoRNAs and the SARS-CoV-2 transcriptome, identifying known and novel interactions. Overall, snoGloBe is a timely new tool that will accelerate our understanding of C/D snoRNA targets and function.

Download Full-text

A G-quadruplex RNA-binding protein which regulates the translation of genes related to Parkinson's disease

10.26226/morressier.5ebd45acffea6f735881b12b ◽

2020 ◽

Author(s):

Marc-Antoine Turcotte

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Rna Binding ◽

Binding Protein ◽

Rna Binding Protein ◽

G Quadruplex

Download Full-text

Faculty Opinions recommendation of Widespread purifying selection at polymorphic sites in human protein-coding loci.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1016701.201233 ◽

2003 ◽

Cited By ~ 1

Author(s):

Thomas Mitchell-Olds

Keyword(s):

Purifying Selection ◽

Human Protein ◽

Protein Coding

Download Full-text

PlncRNADB: A Repository of Plant lncRNAs and lncRNA-RBP Protein Interactions

Current Bioinformatics ◽

10.2174/1574893614666190131161002 ◽

2019 ◽

Vol 14 (7) ◽

pp. 621-627 ◽

Cited By ~ 3

Author(s):

Youhuang Bai ◽

Xiaozhuan Dai ◽

Tiantian Ye ◽

Peijing Zhang ◽

Xu Yan ◽

...

Keyword(s):

Protein Interactions ◽

Binding Proteins ◽

Rna Binding ◽

Rna Binding Proteins ◽

Populus Trichocarpa ◽

Noncoding Rnas ◽

Reference Database ◽

Protein Coding ◽

Arabidopsis Lyrata ◽

User Friendly

Background: Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily longer than 200 nucleotides, that play critical roles in diverse biological processes. LncRNAs exist in different genomes ranging from animals to plants. Objective: PlncRNADB is a searchable database of lncRNA sequences and annotation in plants. Methods: We built a pipeline for lncRNA prediction in plants, providing a convenient utility for users to quickly distinguish potential noncoding RNAs from protein-coding transcripts. Results: More than five thousand lncRNAs are collected from four plant species (Arabidopsis thaliana, Arabidopsis lyrata, Populus trichocarpa and Zea mays) in PlncRNADB. Moreover, our database provides the relationship between lncRNAs and various RNA-binding proteins (RBPs), which can be displayed through a user-friendly web interface. Conclusion: PlncRNADB can serve as a reference database to investigate the lncRNAs and their interaction with RNA-binding proteins in plants. The PlncRNADB is freely available at http://bis.zju.edu.cn/PlncRNADB/.

Download Full-text

Integrated whole transcriptome and small RNA analysis revealed multiple regulatory networks in colorectal cancer

Scientific Reports ◽

10.1038/s41598-021-93531-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Hibah Shaath ◽

Salman M. Toor ◽

Mohamed Abu Nada ◽

Eyad Elkord ◽

Nehad M. Alajez

Keyword(s):

Colorectal Cancer ◽

Messenger Rna ◽

Regulatory Networks ◽

Potential Interaction ◽

Therapeutic Interventions ◽

Signaling Networks ◽

Protein Coding ◽

Paired Samples ◽

And Migration ◽

Whole Transcriptome

AbstractColorectal cancer (CRC) remains a global disease burden and a leading cause of cancer related deaths worldwide. The identification of aberrantly expressed messenger RNA (mRNA), long non-coding RNA (lncRNA), and microRNA (miRNA), and the resulting molecular interactions and signaling networks is essential for better understanding of CRC, identification of novel diagnostic biomarkers and potential development of therapeutic interventions. Herein, we performed microRNA (miRNA) sequencing on fifteen CRC and their non-tumor adjacent tissues and whole transcriptome RNA-Seq on six paired samples from the same cohort and identified alterations in miRNA, mRNA, and lncRNA expression. Computational analyses using Ingenuity Pathway Analysis (IPA) identified multiple activated signaling networks in CRC, including ERBB2, RABL6, FOXM1, and NFKB networks, while functional annotation highlighted activation of cell proliferation and migration as the hallmark of CRC. IPA in combination with in silico prediction algorithms and experimentally validated databases gave insight into the complex associations and interactions between downregulated miRNAs and upregulated mRNAs in CRC and vice versa. Additionally, potential interaction between differentially expressed lncRNAs such as H19, SNHG5, and GATA2-AS1 with multiple miRNAs has been revealed. Taken together, our data provides thorough analysis of dysregulated protein-coding and non-coding RNAs in CRC highlighting numerous associations and regulatory networks thus providing better understanding of CRC.

Download Full-text

Analysis of Stop Codons within Prokaryotic Protein-Coding Genes Suggests Frequent Readthrough Events

International Journal of Molecular Sciences ◽

10.3390/ijms22041876 ◽

2021 ◽

Vol 22 (4) ◽

pp. 1876

Author(s):

Frida Belinky ◽

Ishan Ganguly ◽

Eugenia Poliakov ◽

Vyacheslav Yurchenko ◽

Igor B. Rogozin

Keyword(s):

Stop Codon ◽

Purifying Selection ◽

Protein Product ◽

Intermediate Step ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Synonymous Sites ◽

Prokaryotic Protein ◽

Sense Codon

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.

Download Full-text

RNA-Centric Approaches to Profile the RNA–Protein Interaction Landscape on Selected RNAs

Non-Coding RNA ◽

10.3390/ncrna7010011 ◽

2021 ◽

Vol 7 (1) ◽

pp. 11 ◽

Cited By ~ 1

Author(s):

André P. Gerber

Keyword(s):

Mass Spectrometry ◽

Protein Interactions ◽

Regulatory Networks ◽

Rna Binding ◽

Rna Binding Proteins ◽

Protein Complexes ◽

Cell Protein ◽

Transcriptional Regulatory Networks ◽

Technological Advances

RNA–protein interactions frame post-transcriptional regulatory networks and modulate transcription and epigenetics. While the technological advances in RNA sequencing have significantly expanded the repertoire of RNAs, recently developed biochemical approaches combined with sensitive mass-spectrometry have revealed hundreds of previously unrecognized and potentially novel RNA-binding proteins. Nevertheless, a major challenge remains to understand how the thousands of RNA molecules and their interacting proteins assemble and control the fate of each individual RNA in a cell. Here, I review recent methodological advances to approach this problem through systematic identification of proteins that interact with particular RNAs in living cells. Thereby, a specific focus is given to in vivo approaches that involve crosslinking of RNA–protein interactions through ultraviolet irradiation or treatment of cells with chemicals, followed by capture of the RNA under study with antisense-oligonucleotides and identification of bound proteins with mass-spectrometry. Several recent studies defining interactomes of long non-coding RNAs, viral RNAs, as well as mRNAs are highlighted, and short reference is given to recent in-cell protein labeling techniques. These recent experimental improvements could open the door for broader applications and to study the remodeling of RNA–protein complexes upon different environmental cues and in disease.

Download Full-text