scholarly journals Integrative analysis reveals RNA G-Quadruplexes in UTRs are selectively constrained and enriched for functional associations

2019 ◽  
Author(s):  
David S.M. Lee ◽  
Louis R. Ghanem ◽  
Yoseph Barash

ABSTRACTIdentifying regulatory elements in the noncoding genome is a fundamental challenge in biology. G-quadruplex (G4) sequences are abundant in untranslated regions (UTRs) of human messenger RNAs, but their functional importance remains unclear. By integrating multiple sources of genetic and genomic data, we show that putative G-quadruplex forming sequences (pG4) in 5’ and 3’ UTRs are selectively constrained, and enriched for cis-eQTLs and RNA-binding protein (RBP) interactions. Using over 15,000 whole-genome sequences, we uncover a degree of negative (purifying) selection in UTR pG4s comparable to that of missense variation in protein-coding sequences. In parallel, we identify new proteins with evidence for preferential binding at pG4s from ENCODE annotations, and delineate putative regulatory networks composed of shared binding targets. Finally, by mapping variants in the NIH GWAS Catalogue and ClinVar, we find enrichment for disease-associated variation in 3’UTR pG4s. At a GWAS pG4-variant associated with hypertension in HSPB7, we uncover robust allelic imbalance in GTEx RNA-seq across multiple tissues, suggesting that changes in gene expression associated with pG4 disruption underlie the observed phenotypic association. Taken together, our results establish UTR G-quadruplexes as important cis-regulatory features, and point to a putative link between disruption within UTR pG4 and susceptibility to human disease.

2017 ◽  
Author(s):  
Jialin Liu ◽  
Marc Robinson-Rechavi

AbstractDevelopmental constraints on genome evolution have been suggested to follow either an early conservation model or an “hourglass” model. Both models agree that late development strongly diverges between species, but debate on which developmental period is the most conserved. Here, based on a modified “Transcriptome Age Index” approach, i.e. weighting trait measures by expression level, we analyzed the constraints acting on three evolutionary traits of protein coding genes (strength of purifying selection on protein sequences, phyletic age, and duplicability) in four species: nematode worm Caenorhabditis elegans, fly Drosophila melanogaster, zebrafish Danio rerio, and mouse Mus musculus. In general, we found that both models can be supported by different genomic properties. Sequence evolution follows an hourglass model, but the evolution of phyletic age and of duplicability follow an early conservation model. Further analyses indicate that stronger purifying selection on sequences in the middle development are driven by temporal pleiotropy of these genes. In addition, we report evidence that expression in late development is enriched with retrogenes, which usually lack efficient regulatory elements. This implies that expression in late development could facilitate transcription of new genes, and provide opportunities for acquisition of function. Finally, in C. elegans, we suggest that dosage imbalance could be one of the main factors that cause depleted expression of high duplicability genes in early development.


Endogenous retroviruses (ERV) are the descendants of exogenous retroviruses that integrated into the germ cells genome, fixed and became inheritable. ERVs have evolved transcriptional enhancers and promoters that allow their replication in a wide range of tissue. Because ERVs comprise the regulatory elements it could be assume that ERVs capable to shape and reshape genomic regulatory networks by inserting their promoters and enhancers in new genomic loci upon retrotransposition. Thus retroransposition events can build new regulatory regions and lead to a new pattern of gene activation in the cell. In this review we summarize evidence which revealed that ERVs provide a plethora of novel gene regulatory elements, including tissue specific promoters and enhancers for protein-coding genes or long noncoding RNAs in a wide range of cell types. The accumulated findings support the hypothesis that the ERVs have rewired the gene regulatory networks and act as a major source of genomic regulatory innovation during evolution.


2019 ◽  
Vol 47 (W1) ◽  
pp. W121-W126 ◽  
Author(s):  
Michael Silk ◽  
Slavé Petrovski ◽  
David B Ascher

Abstract Advances in genomic sequencing have enormous potential to revolutionize personalized medicine, however distinguishing disease-causing from benign variants remains a challenge. The increasing number of human genome and exome sequences available has revealed areas where unfavourable variation is removed through purifying selection. Here, we present the MTR-Viewer, a web-server enabling easy visualization at the gene or variant level of the Missense Tolerance Ratio (MTR), a measure of regional intolerance to missense variation calculated using variation from 240 000 exome and genome sequences. The MTR-Viewer enables exploration of MTR calculations, using different sliding windows, for over 18 000 human protein-coding genes and 85 000 alternative transcripts. Users can also view MTR scores calculated for specific ethnicities, to enable easy exploration of regions that may be under different selective pressure. The spatial distribution of population and known disease variants is also displayed on the protein's domain structure. Intolerant regions were found to be highly enriched for ClinVar pathogenic and COSMIC somatic missense variants (Mann–Whitney U test P < 2.2 × 10−16). As the MTR is not biased by known domains and protein features, it can highlight functionally important regions within genes overlooked or inaccessible by traditional methods. MTR-Viewer is freely available via a user friendly web-server at http://biosig.unimelb.edu.au/mtr-viewer/.


2021 ◽  
Author(s):  
Gabrielle Deschamps-Francoeur ◽  
Sonia Couture ◽  
Sherif Abou Elela ◽  
Michelle S Scott

Box C/D small nucleolar RNAs (snoRNAs) are a conserved class of noncoding RNA known to serve as guides for the site-specific 2'-O-ribose methylation of ribosomal RNAs and the U6 small nuclear RNA, through direct base pairing with the target. In recent years however, several examples of box C/D snoRNAs regulating different levels of gene expression including transcript stability and splicing have been reported. These regulatory interactions typically require direct binding of the target but do not always involve the guide region. Supporting these new box C/D snoRNA functions, high-throughput RNA-RNA interaction datasets detect many interactions between box C/D snoRNAs and messenger RNAs. To facilitate the study of box C/D snoRNA functionality, we created snoGloBe, a box C/D snoRNA machine learning target predictor based on a gradient boosting classifier and considering snoRNA and target sequence and position as well as target type. SnoGloBe convincingly outperforms general RNA duplex predictors and PLEXY, the only box C/D snoRNA-specific target predictor available. The study of snoGloBe human transcriptome-wide predictions identifies enrichment in snoRNA interactions in exons and on exon-intron junctions. Some specific snoRNAs are predicted to target groups of functionally-related transcripts on common regulatory elements and the exact position of the predicted targets strongly overlaps binding sites of RNA-binding proteins involved in relevant molecular functions. SnoGloBe was also applied to predicting interactions between human box C/D snoRNAs and the SARS-CoV-2 transcriptome, identifying known and novel interactions. Overall, snoGloBe is a timely new tool that will accelerate our understanding of C/D snoRNA targets and function.


2019 ◽  
Vol 14 (7) ◽  
pp. 621-627 ◽  
Author(s):  
Youhuang Bai ◽  
Xiaozhuan Dai ◽  
Tiantian Ye ◽  
Peijing Zhang ◽  
Xu Yan ◽  
...  

Background: Long noncoding RNAs (lncRNAs) are endogenous noncoding RNAs, arbitrarily longer than 200 nucleotides, that play critical roles in diverse biological processes. LncRNAs exist in different genomes ranging from animals to plants. Objective: PlncRNADB is a searchable database of lncRNA sequences and annotation in plants. Methods: We built a pipeline for lncRNA prediction in plants, providing a convenient utility for users to quickly distinguish potential noncoding RNAs from protein-coding transcripts. Results: More than five thousand lncRNAs are collected from four plant species (Arabidopsis thaliana, Arabidopsis lyrata, Populus trichocarpa and Zea mays) in PlncRNADB. Moreover, our database provides the relationship between lncRNAs and various RNA-binding proteins (RBPs), which can be displayed through a user-friendly web interface. Conclusion: PlncRNADB can serve as a reference database to investigate the lncRNAs and their interaction with RNA-binding proteins in plants. The PlncRNADB is freely available at http://bis.zju.edu.cn/PlncRNADB/.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hibah Shaath ◽  
Salman M. Toor ◽  
Mohamed Abu Nada ◽  
Eyad Elkord ◽  
Nehad M. Alajez

AbstractColorectal cancer (CRC) remains a global disease burden and a leading cause of cancer related deaths worldwide. The identification of aberrantly expressed messenger RNA (mRNA), long non-coding RNA (lncRNA), and microRNA (miRNA), and the resulting molecular interactions and signaling networks is essential for better understanding of CRC, identification of novel diagnostic biomarkers and potential development of therapeutic interventions. Herein, we performed microRNA (miRNA) sequencing on fifteen CRC and their non-tumor adjacent tissues and whole transcriptome RNA-Seq on six paired samples from the same cohort and identified alterations in miRNA, mRNA, and lncRNA expression. Computational analyses using Ingenuity Pathway Analysis (IPA) identified multiple activated signaling networks in CRC, including ERBB2, RABL6, FOXM1, and NFKB networks, while functional annotation highlighted activation of cell proliferation and migration as the hallmark of CRC. IPA in combination with in silico prediction algorithms and experimentally validated databases gave insight into the complex associations and interactions between downregulated miRNAs and upregulated mRNAs in CRC and vice versa. Additionally, potential interaction between differentially expressed lncRNAs such as H19, SNHG5, and GATA2-AS1 with multiple miRNAs has been revealed. Taken together, our data provides thorough analysis of dysregulated protein-coding and non-coding RNAs in CRC highlighting numerous associations and regulatory networks thus providing better understanding of CRC.


2021 ◽  
Vol 22 (4) ◽  
pp. 1876
Author(s):  
Frida Belinky ◽  
Ishan Ganguly ◽  
Eugenia Poliakov ◽  
Vyacheslav Yurchenko ◽  
Igor B. Rogozin

Nonsense mutations turn a coding (sense) codon into an in-frame stop codon that is assumed to result in a truncated protein product. Thus, nonsense substitutions are the hallmark of pseudogenes and are used to identify them. Here we show that in-frame stop codons within bacterial protein-coding genes are widespread. Their evolutionary conservation suggests that many of them are not pseudogenes, since they maintain dN/dS values (ratios of substitution rates at non-synonymous and synonymous sites) significantly lower than 1 (this is a signature of purifying selection in protein-coding regions). We also found that double substitutions in codons—where an intermediate step is a nonsense substitution—show a higher rate of evolution compared to null models, indicating that a stop codon was introduced and then changed back to sense via positive selection. This further supports the notion that nonsense substitutions in bacteria are relatively common and do not necessarily cause pseudogenization. In-frame stop codons may be an important mechanism of regulation: Such codons are likely to cause a substantial decrease of protein expression levels.


2021 ◽  
Vol 7 (1) ◽  
pp. 11 ◽  
Author(s):  
André P. Gerber

RNA–protein interactions frame post-transcriptional regulatory networks and modulate transcription and epigenetics. While the technological advances in RNA sequencing have significantly expanded the repertoire of RNAs, recently developed biochemical approaches combined with sensitive mass-spectrometry have revealed hundreds of previously unrecognized and potentially novel RNA-binding proteins. Nevertheless, a major challenge remains to understand how the thousands of RNA molecules and their interacting proteins assemble and control the fate of each individual RNA in a cell. Here, I review recent methodological advances to approach this problem through systematic identification of proteins that interact with particular RNAs in living cells. Thereby, a specific focus is given to in vivo approaches that involve crosslinking of RNA–protein interactions through ultraviolet irradiation or treatment of cells with chemicals, followed by capture of the RNA under study with antisense-oligonucleotides and identification of bound proteins with mass-spectrometry. Several recent studies defining interactomes of long non-coding RNAs, viral RNAs, as well as mRNAs are highlighted, and short reference is given to recent in-cell protein labeling techniques. These recent experimental improvements could open the door for broader applications and to study the remodeling of RNA–protein complexes upon different environmental cues and in disease.


Sign in / Sign up

Export Citation Format

Share Document