scholarly journals Evolutionary Analyses of Base-Pairing Interactions in DNA and RNA Secondary Structures

2019 ◽  
Vol 37 (2) ◽  
pp. 576-592 ◽  
Author(s):  
Michael Golden ◽  
Benjamin Murrell ◽  
Darren Martin ◽  
Oliver G Pybus ◽  
Jotun Hein

Abstract Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here, we introduce a sequence evolution model, MESSI (Modeling the Evolution of Secondary Structure Interactions), that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution while accounting for an unknown secondary structure. MESSI can also use graphics processing unit parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in noncoding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and noncoding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses. A potential biophysical explanation is that GT pairs do not stabilize DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure. We found that estimates of coevolution were more strongly correlated with experimentally determined SHAPE-MaP pairing scores than three nonevolutionary measures of base-pairing covariation. To assist researchers in prioritizing substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, among several uncharacterized top-ranking substructures.

2018 ◽  
Author(s):  
Michael Golden ◽  
Ben Murrell ◽  
Oliver G. Pybus ◽  
Darren Martin ◽  
Jotun Hein

AbstractPairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here we introduce a sequence evolution model, MESSI, that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution whilst accounting for an unknown secondary structure. MESSI can also use GPU parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in non-coding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and non-coding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses, suggesting that GT pairs do not stabilise DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure and two corresponding alignments. We found that estimates of coevolution were more strongly correlated with experimentally-determined SHAPE-MaP pairing scores than three non-evolutionary measures of base-pairing covariation. To assist researchers in prioritising substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, amongst several uncharacterised top-ranking substructures.


2001 ◽  
Vol 75 (24) ◽  
pp. 12105-12113 ◽  
Author(s):  
Qi Liu ◽  
Reed F. Johnson ◽  
Julian L. Leibowitz

ABSTRACT Previously, we characterized two host protein binding elements located within the 3′-terminal 166 nucleotides of the mouse hepatitis virus (MHV) genome and assessed their functions in defective-interfering (DI) RNA replication. To determine the role of RNA secondary structures within these two host protein binding elements in viral replication, we explored the secondary structure of the 3′-terminal 166 nucleotides of the MHV strain JHM genome using limited RNase digestion assays. Our data indicate that multiple stem-loop and hairpin-loop structures exist within this region. Mutant and wild-type DIssEs were employed to test the function of secondary structure elements in DI RNA replication. Three stem structures were chosen as targets for the introduction of transversion mutations designed to destroy base pairing structures. Mutations predicted to destroy the base pairing of nucleotides 142 to 136 with nucleotides 68 to 74 exhibited a deleterious effect on DIssE replication. Destruction of base pairing between positions 96 to 99 and 116 to 113 also decreased DI RNA replication. Mutations interfering with the pairing of nucleotides 67 to 63 with nucleotides 52 to 56 had only minor effects on DIssE replication. The introduction of second complementary mutations which restored the predicted base pairing of positions 142 to 136 with 68 to 74 and nucleotides 96 to 99 with 116 to 113 largely ameliorated defects in replication ability, restoring DI RNA replication to levels comparable to that of wild-type DIssE RNA, suggesting that these secondary structures are important for efficient MHV replication. We also identified a conserved 23-nucleotide stem-loop structure involving nucleotides 142 to 132 and nucleotides 68 to 79. The upstream side of this conserved stem-loop is contained within a host protein binding element (nucleotides 166 to 129).


2019 ◽  
Author(s):  
Masaki Tagashira ◽  
Kiyoshi Asai

AbstractMotivationThe simultaneous optimization of the sequence alignment and secondary structures among RNAs, structural alignment, has been required for the more appropriate comparison of functional ncRNAs than sequence alignment. Pseudo-probabilities given RNA sequences on structural alignment have been desired for more-accurate secondary structures, sequence alignments, consensus secondary structures, and structural alignments. However, any algorithms have not been proposed for these pseudo-probabilities.ResultsWe invented the RNAfamProb algorithm, an algorithm for estimating these pseudo-probabilities. We performed the application of these pseudo-probabilities to two biological problems, the visualization with these pseudo-probabilities and maximum-expected-accuracy secondary-structure (estimation). The RNAfamProb program, an implementation of this algorithm, plus the NeoFold program, a maximum-expected-accuracy secondary-structure program with these pseudo-probabilities, demonstrated prediction accuracy better than three state-of-the-art programs of maximum-expected-accuracy secondary-structure while demanding running time far longer than these three programs as expected due to the intrinsic serious problem-complexity of structural alignment compared with independent secondary structure and sequence alignment. Both the RNAfamProb and NeoFold programs estimate matters more accurately with incorporating homologous-RNA sequences.AvailabilityThe source code of each of these two programs is available on each of “https://github.com/heartsh/rnafamprob” and “https://github.com/heartsh/neofold”.Contact“[email protected]” and “[email protected]”.Supplementary informationSupplementary data are available at Bioinformatics online.


10.29007/bhsr ◽  
2020 ◽  
Author(s):  
Mutlu Mete ◽  
Abdullah Arslan

This study is part of our perpetual effort to develop improved RNA secondary structure analysis tools and databases. In this work we present a new Graphical Processing Unit (GPU)-based RNA structural analysis framework that supports fast multiple RNA secondary structure comparison for very large databases. A search-based secondary structure comparison algorithm deployed in RNASSAC website helps bioinformaticians find common RNA substructures from the underlying database. The algorithm performs two levels of binary searches on the database. Its time requirement is affected by the database size. Experiments on the RNASSAC website show that the algorithm takes seconds for a database of 4,666 RNAs. For example, it takes about 4.4 sec for comparing 25 RNAs from this database. In another case, when many non-overlapping common substructures are desired, a heuristic approach requires as long as 85 sec in comparing 40 RNAs from the same database. The comparisons by this sequential algorithm takes at least 50% more time when RNAs are compared from the database of several millions of RNAs. The most recently curated databases already have millions of RNA secondary structures. The improvement in run-time performance of comparison algorithms is necessary. This study present a GPU-based RNA substructure comparison algorithm with which running time for multiple RNA secondary structures remains feasible for large databases. Our new parallel algorithm is 12 times faster than the CPU version (sequential) comparison algorithm of the RNASSAC website. The response time significantly reduces towards development of a realtime RNA comparison web service for bioinformatics community.


2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
J Fonager ◽  
T K Fischer

Abstract Transmission of HIV-1 resistance mutations among therapy-naïve patients impairs the efficiency of antiretroviral therapy (ART). Therefore, genotypic resistance testing of patients is recommended at baseline, as this both allows for the selection of the correct ART regimen and for surveillance of transmitted drug resistance mutations (TDRM) among therapy naive HIV-1 patients. In Denmark, the occurrence of TDRM in newly diagnosed and therapy naïve HIV-1 patients is monitored through the SERO project. Here, we investigated if the prevalence of TDRM differed between patients within and outside of phylogenetically identified transmission clusters. Samples from 1,227 newly diagnosed HIV-1 patients were sent along with epidemiological information to the Virological Surveillance and Research group at Statens Serum Institut. HIV-1 RNA extraction, RT-PCR and Sanger sequencing of the pol gene was performed using an in-house assay. The sequences were analyzed using BioNumerics v. 6.6 and manually checked for the presence of mixed mutations and analyzed for mutations using the HIVDB 8.4 algorithm implemented at the Stanford database. Sequence alignments were performed in Mafft, and phylogenetic analysis was performed using Mega 6.0 using the Maximum likelihood general time reversible model with 100 bootstrap replicates. Clusters were identified with ClusterPicker at default settings (cluster support = 90%, genetic distance 4.5%). Active clusters contained newly diagnosed patients from the 2015 to 2017 period. HIV-1 sequences from 588 patients belonged to one of 154 clusters, and sequences from 639 patients did not belong to a cluster. Patients in clusters were significantly more likely to be men who have sex with men and subtype B and significantly less likely to be late presenters (Fisher’s test P < 0.05). The TDRM prevalence was significantly higher for patients outside of clusters than within clusters, 16.6 per cent versus 12.1 per cent, respectively (Fisher’s test P < 0.05); however, no significant differences were found in the TDRM prevalence between the 75 active and 79 inactive clusters, nor between small (<3 patients) and large (≥3 patients) clusters. E138A, V179D, and K103N were the three most prevalent TDRMs for both patient groups, whereas M41L differed between them. In Denmark, the TDRM prevalence is lower within clusters than outside, indicating that TDRM cases are either imported and/or belong to yet unidentified clusters.


2018 ◽  
Author(s):  
Manato Akiyama ◽  
Yasubumi Sakakibara ◽  
Kengo Sato

AbstractMotivationExisting approaches for predicting RNA secondary structures depend on howto decompose a secondary structure into substructures, so-called the architecture, to define their parameter space. However, the architecture has not been sufficiently investigated especially for pseudoknotted secondary structures.ResultsIn this paper, we propose a novel algorithm to directly infer base-pairing probabilities with neural networks that does not depend on the architecture of RNA secondary structures, followed by performing the maximum expected accuracy (MEA) based decoding algorithms; Nussinov-style decoding for pseudoknot-free structures, and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base-pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm achieves the best prediction accuracy compared with existing methods.AvailabilityThe source code is available at https://github.com/keio-bioinformatics/neuralfold/[email protected]


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-15
Author(s):  
Lina Yang ◽  
Yang Liu ◽  
Xiaochun Hu ◽  
Patrick Wang ◽  
Xichun Li ◽  
...  

In organisms, ribonucleic acid (RNA) plays an essential role. Its function is being discovered more and more. Due to the conserved nature of RNA sequences, its function mainly depends on the RNA secondary structure. The discovery of an approximate relationship between two RNA secondary structures helps to understand their functional relationship better. It is an important and urgent task to explore structural similarities from the graphical representation of RNA secondary structures. In this paper, a novel graphical analysis method based on the triple vector curve representation of RNA secondary structures is proposed. A combinational method involving a discrete wavelet transform (DWT) and fractal dimension with sliding window is introduced to analyze and compare the graphs derived from feature extraction; after that, the distance matrix is generated. Then, the distance matrix is analyzed by clustering and visualized as a clustering tree. RNA virus and noncoding RNA datasets are applied to perform experiments and analyze the clustering tree. The results show that the proposed method yields more accurate results in the comparison of RNA secondary structures.


2017 ◽  
Vol 13 ◽  
pp. 117693431772476 ◽  
Author(s):  
Jyh-Da Wei ◽  
Hui-Jun Cheng ◽  
Chun-Yuan Lin ◽  
Jin Ye ◽  
Kuan-Yu Yeh

Sign in / Sign up

Export Citation Format

Share Document