scholarly journals Evolutionary analyses of base-pairing interactions in DNA and RNA secondary structures

2018 ◽  
Author(s):  
Michael Golden ◽  
Ben Murrell ◽  
Oliver G. Pybus ◽  
Darren Martin ◽  
Jotun Hein

AbstractPairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here we introduce a sequence evolution model, MESSI, that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution whilst accounting for an unknown secondary structure. MESSI can also use GPU parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in non-coding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and non-coding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses, suggesting that GT pairs do not stabilise DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure and two corresponding alignments. We found that estimates of coevolution were more strongly correlated with experimentally-determined SHAPE-MaP pairing scores than three non-evolutionary measures of base-pairing covariation. To assist researchers in prioritising substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, amongst several uncharacterised top-ranking substructures.

2019 ◽  
Vol 37 (2) ◽  
pp. 576-592 ◽  
Author(s):  
Michael Golden ◽  
Benjamin Murrell ◽  
Darren Martin ◽  
Oliver G Pybus ◽  
Jotun Hein

Abstract Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here, we introduce a sequence evolution model, MESSI (Modeling the Evolution of Secondary Structure Interactions), that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution while accounting for an unknown secondary structure. MESSI can also use graphics processing unit parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in noncoding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and noncoding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses. A potential biophysical explanation is that GT pairs do not stabilize DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure. We found that estimates of coevolution were more strongly correlated with experimentally determined SHAPE-MaP pairing scores than three nonevolutionary measures of base-pairing covariation. To assist researchers in prioritizing substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, among several uncharacterized top-ranking substructures.


2001 ◽  
Vol 75 (24) ◽  
pp. 12105-12113 ◽  
Author(s):  
Qi Liu ◽  
Reed F. Johnson ◽  
Julian L. Leibowitz

ABSTRACT Previously, we characterized two host protein binding elements located within the 3′-terminal 166 nucleotides of the mouse hepatitis virus (MHV) genome and assessed their functions in defective-interfering (DI) RNA replication. To determine the role of RNA secondary structures within these two host protein binding elements in viral replication, we explored the secondary structure of the 3′-terminal 166 nucleotides of the MHV strain JHM genome using limited RNase digestion assays. Our data indicate that multiple stem-loop and hairpin-loop structures exist within this region. Mutant and wild-type DIssEs were employed to test the function of secondary structure elements in DI RNA replication. Three stem structures were chosen as targets for the introduction of transversion mutations designed to destroy base pairing structures. Mutations predicted to destroy the base pairing of nucleotides 142 to 136 with nucleotides 68 to 74 exhibited a deleterious effect on DIssE replication. Destruction of base pairing between positions 96 to 99 and 116 to 113 also decreased DI RNA replication. Mutations interfering with the pairing of nucleotides 67 to 63 with nucleotides 52 to 56 had only minor effects on DIssE replication. The introduction of second complementary mutations which restored the predicted base pairing of positions 142 to 136 with 68 to 74 and nucleotides 96 to 99 with 116 to 113 largely ameliorated defects in replication ability, restoring DI RNA replication to levels comparable to that of wild-type DIssE RNA, suggesting that these secondary structures are important for efficient MHV replication. We also identified a conserved 23-nucleotide stem-loop structure involving nucleotides 142 to 132 and nucleotides 68 to 79. The upstream side of this conserved stem-loop is contained within a host protein binding element (nucleotides 166 to 129).


2015 ◽  
Vol 13 (7) ◽  
pp. 2022-2025 ◽  
Author(s):  
Kenji Usui ◽  
Arisa Okada ◽  
Keita Kobayashi ◽  
Naoki Sugimoto

A regulation system for DNA secondary structure formation of G-rich sequences using a designed PNA peptide exhibiting an enzyme-responsive functionality, depending on the protease activity was constructed.


2021 ◽  
Author(s):  
Jiaming Li ◽  
Jin Bae ◽  
Boyan Yordanov ◽  
Michael X Wang ◽  
Javier Gonzalez ◽  
...  

The prediction of DNA secondary structures from DNA sequences using thermodynamic models is imperfect for many biological sequences, both due to insufficient experimental data for training and to the kinetics of folding that lead to metastable structures. Here, we developed low-yield bisulfite sequencing (LYB-seq) to query the secondary structure states of cytosine (C) nucleotides in thousands of different DNA oligonucleotides on a single-molecule level. We observed that the reaction kinetics between bisulfite and C nucleotides is highly dependent on the secondary structure state of the C nucleotides, with the most accessible C nucleotides (those in small hairpin loops) reacting 70-fold faster than those in stable duplexes. Next, we developed a statistical model to evaluate the likelihood of an NGS read being consistent with a particular proposed secondary structure. By analyzing thousands of NGS reads for each DNA species, we can infer the distribution of secondary structures adopted by each species in solution. We find that 84% of 1,057 human genome subsequences studied here adopt 2 or more stable secondary structures in solution.


2019 ◽  
Author(s):  
Masaki Tagashira ◽  
Kiyoshi Asai

AbstractMotivationThe simultaneous optimization of the sequence alignment and secondary structures among RNAs, structural alignment, has been required for the more appropriate comparison of functional ncRNAs than sequence alignment. Pseudo-probabilities given RNA sequences on structural alignment have been desired for more-accurate secondary structures, sequence alignments, consensus secondary structures, and structural alignments. However, any algorithms have not been proposed for these pseudo-probabilities.ResultsWe invented the RNAfamProb algorithm, an algorithm for estimating these pseudo-probabilities. We performed the application of these pseudo-probabilities to two biological problems, the visualization with these pseudo-probabilities and maximum-expected-accuracy secondary-structure (estimation). The RNAfamProb program, an implementation of this algorithm, plus the NeoFold program, a maximum-expected-accuracy secondary-structure program with these pseudo-probabilities, demonstrated prediction accuracy better than three state-of-the-art programs of maximum-expected-accuracy secondary-structure while demanding running time far longer than these three programs as expected due to the intrinsic serious problem-complexity of structural alignment compared with independent secondary structure and sequence alignment. Both the RNAfamProb and NeoFold programs estimate matters more accurately with incorporating homologous-RNA sequences.AvailabilityThe source code of each of these two programs is available on each of “https://github.com/heartsh/rnafamprob” and “https://github.com/heartsh/neofold”.Contact“[email protected]” and “[email protected]”.Supplementary informationSupplementary data are available at Bioinformatics online.


2019 ◽  
Vol 5 (Supplement_1) ◽  
Author(s):  
J Fonager ◽  
T K Fischer

Abstract Transmission of HIV-1 resistance mutations among therapy-naïve patients impairs the efficiency of antiretroviral therapy (ART). Therefore, genotypic resistance testing of patients is recommended at baseline, as this both allows for the selection of the correct ART regimen and for surveillance of transmitted drug resistance mutations (TDRM) among therapy naive HIV-1 patients. In Denmark, the occurrence of TDRM in newly diagnosed and therapy naïve HIV-1 patients is monitored through the SERO project. Here, we investigated if the prevalence of TDRM differed between patients within and outside of phylogenetically identified transmission clusters. Samples from 1,227 newly diagnosed HIV-1 patients were sent along with epidemiological information to the Virological Surveillance and Research group at Statens Serum Institut. HIV-1 RNA extraction, RT-PCR and Sanger sequencing of the pol gene was performed using an in-house assay. The sequences were analyzed using BioNumerics v. 6.6 and manually checked for the presence of mixed mutations and analyzed for mutations using the HIVDB 8.4 algorithm implemented at the Stanford database. Sequence alignments were performed in Mafft, and phylogenetic analysis was performed using Mega 6.0 using the Maximum likelihood general time reversible model with 100 bootstrap replicates. Clusters were identified with ClusterPicker at default settings (cluster support = 90%, genetic distance 4.5%). Active clusters contained newly diagnosed patients from the 2015 to 2017 period. HIV-1 sequences from 588 patients belonged to one of 154 clusters, and sequences from 639 patients did not belong to a cluster. Patients in clusters were significantly more likely to be men who have sex with men and subtype B and significantly less likely to be late presenters (Fisher’s test P < 0.05). The TDRM prevalence was significantly higher for patients outside of clusters than within clusters, 16.6 per cent versus 12.1 per cent, respectively (Fisher’s test P < 0.05); however, no significant differences were found in the TDRM prevalence between the 75 active and 79 inactive clusters, nor between small (<3 patients) and large (≥3 patients) clusters. E138A, V179D, and K103N were the three most prevalent TDRMs for both patient groups, whereas M41L differed between them. In Denmark, the TDRM prevalence is lower within clusters than outside, indicating that TDRM cases are either imported and/or belong to yet unidentified clusters.


2018 ◽  
Author(s):  
Manato Akiyama ◽  
Yasubumi Sakakibara ◽  
Kengo Sato

AbstractMotivationExisting approaches for predicting RNA secondary structures depend on howto decompose a secondary structure into substructures, so-called the architecture, to define their parameter space. However, the architecture has not been sufficiently investigated especially for pseudoknotted secondary structures.ResultsIn this paper, we propose a novel algorithm to directly infer base-pairing probabilities with neural networks that does not depend on the architecture of RNA secondary structures, followed by performing the maximum expected accuracy (MEA) based decoding algorithms; Nussinov-style decoding for pseudoknot-free structures, and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base-pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm achieves the best prediction accuracy compared with existing methods.AvailabilityThe source code is available at https://github.com/keio-bioinformatics/neuralfold/[email protected]


2020 ◽  
Vol 48 (12) ◽  
pp. 6654-6671
Author(s):  
Karol Szlachta ◽  
Arkadi Manukyan ◽  
Heather M Raimer ◽  
Sandeep Singh ◽  
Anita Salamon ◽  
...  

Abstract DNA double-stranded breaks (DSBs) trigger human genome instability, therefore identifying what factors contribute to DSB induction is critical for our understanding of human disease etiology. Using an unbiased, genome-wide approach, we found that genomic regions with the ability to form highly stable DNA secondary structures are enriched for endogenous DSBs in human cells. Human genomic regions predicted to form non-B-form DNA induced gross chromosomal rearrangements in yeast and displayed high indel frequency in human genomes. The extent of instability in both analyses is in concordance with the structure forming ability of these regions. We also observed an enrichment of DNA secondary structure-prone sites overlapping transcription start sites (TSSs) and CCCTC-binding factor (CTCF) binding sites, and uncovered an increase in DSBs at highly stable DNA secondary structure regions, in response to etoposide, an inhibitor of topoisomerase II (TOP2) re-ligation activity. Importantly, we found that TOP2 deficiency in both yeast and human leads to a significant reduction in DSBs at structure-prone loci, and that sites of TOP2 cleavage have a greater ability to form highly stable DNA secondary structures. This study reveals a direct role for TOP2 in generating secondary structure-mediated DNA fragility, advancing our understanding of mechanisms underlying human genome instability.


Sign in / Sign up

Export Citation Format

Share Document