Evolutionary analyses of base-pairing interactions in DNA and RNA secondary structures

Mapping Intimacies ◽

10.1101/419341 ◽

2018 ◽

Author(s):

Michael Golden ◽

Ben Murrell ◽

Oliver G. Pybus ◽

Darren Martin ◽

Jotun Hein

Keyword(s):

Secondary Structure ◽

Secondary Structures ◽

Sequence Evolution ◽

Base Pairing ◽

Sequence Alignments ◽

Dna Viruses ◽

Dna And Rna ◽

Subtype B ◽

Dna Secondary Structures ◽

Hiv 1

AbstractPairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here we introduce a sequence evolution model, MESSI, that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution whilst accounting for an unknown secondary structure. MESSI can also use GPU parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in non-coding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and non-coding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses, suggesting that GT pairs do not stabilise DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure and two corresponding alignments. We found that estimates of coevolution were more strongly correlated with experimentally-determined SHAPE-MaP pairing scores than three non-evolutionary measures of base-pairing covariation. To assist researchers in prioritising substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, amongst several uncharacterised top-ranking substructures.

Download Full-text

Evolutionary Analyses of Base-Pairing Interactions in DNA and RNA Secondary Structures

Molecular Biology and Evolution ◽

10.1093/molbev/msz243 ◽

2019 ◽

Vol 37 (2) ◽

pp. 576-592 ◽

Cited By ~ 2

Author(s):

Michael Golden ◽

Benjamin Murrell ◽

Darren Martin ◽

Oliver G Pybus ◽

Jotun Hein

Keyword(s):

Secondary Structure ◽

Noncoding Rna ◽

Graphics Processing Unit ◽

Secondary Structures ◽

Sequence Evolution ◽

Processing Unit ◽

Base Pairing ◽

Sequence Alignments ◽

Subtype B ◽

Hiv 1

Abstract Pairs of nucleotides within functional nucleic acid secondary structures often display evidence of coevolution that is consistent with the maintenance of base-pairing. Here, we introduce a sequence evolution model, MESSI (Modeling the Evolution of Secondary Structure Interactions), that infers coevolution associated with base-paired sites in DNA or RNA sequence alignments. MESSI can estimate coevolution while accounting for an unknown secondary structure. MESSI can also use graphics processing unit parallelism to increase computational speed. We used MESSI to infer coevolution associated with GC, AU (AT in DNA), GU (GT in DNA) pairs in noncoding RNA alignments, and in single-stranded RNA and DNA virus alignments. Estimates of GU pair coevolution were found to be higher at base-paired sites in single-stranded RNA viruses and noncoding RNAs than estimates of GT pair coevolution in single-stranded DNA viruses. A potential biophysical explanation is that GT pairs do not stabilize DNA secondary structures to the same extent that GU pairs do in RNA. Additionally, MESSI estimates the degrees of coevolution at individual base-paired sites in an alignment. These estimates were computed for a SHAPE-MaP-determined HIV-1 NL4-3 RNA secondary structure. We found that estimates of coevolution were more strongly correlated with experimentally determined SHAPE-MaP pairing scores than three nonevolutionary measures of base-pairing covariation. To assist researchers in prioritizing substructures with potential functionality, MESSI automatically ranks substructures by degrees of coevolution at base-paired sites within them. Such a ranking was created for an HIV-1 subtype B alignment, revealing an excess of top-ranking substructures that have been previously identified as having structure-related functional importance, among several uncharacterized top-ranking substructures.

Download Full-text

Secondary Structural Elements within the 3′ Untranslated Region of Mouse Hepatitis Virus Strain JHM Genomic RNA

Journal of Virology ◽

10.1128/jvi.75.24.12105-12113.2001 ◽

2001 ◽

Vol 75 (24) ◽

pp. 12105-12113 ◽

Cited By ~ 35

Author(s):

Qi Liu ◽

Reed F. Johnson ◽

Julian L. Leibowitz

Keyword(s):

Secondary Structure ◽

Protein Binding ◽

Hepatitis Virus ◽

Mouse Hepatitis Virus ◽

Rna Replication ◽

Secondary Structures ◽

Host Protein ◽

Wild Type ◽

Base Pairing ◽

Stem Loop

ABSTRACT Previously, we characterized two host protein binding elements located within the 3′-terminal 166 nucleotides of the mouse hepatitis virus (MHV) genome and assessed their functions in defective-interfering (DI) RNA replication. To determine the role of RNA secondary structures within these two host protein binding elements in viral replication, we explored the secondary structure of the 3′-terminal 166 nucleotides of the MHV strain JHM genome using limited RNase digestion assays. Our data indicate that multiple stem-loop and hairpin-loop structures exist within this region. Mutant and wild-type DIssEs were employed to test the function of secondary structure elements in DI RNA replication. Three stem structures were chosen as targets for the introduction of transversion mutations designed to destroy base pairing structures. Mutations predicted to destroy the base pairing of nucleotides 142 to 136 with nucleotides 68 to 74 exhibited a deleterious effect on DIssE replication. Destruction of base pairing between positions 96 to 99 and 116 to 113 also decreased DI RNA replication. Mutations interfering with the pairing of nucleotides 67 to 63 with nucleotides 52 to 56 had only minor effects on DIssE replication. The introduction of second complementary mutations which restored the predicted base pairing of positions 142 to 136 with 68 to 74 and nucleotides 96 to 99 with 116 to 113 largely ameliorated defects in replication ability, restoring DI RNA replication to levels comparable to that of wild-type DIssE RNA, suggesting that these secondary structures are important for efficient MHV replication. We also identified a conserved 23-nucleotide stem-loop structure involving nucleotides 142 to 132 and nucleotides 68 to 79. The upstream side of this conserved stem-loop is contained within a host protein binding element (nucleotides 166 to 129).

Download Full-text

Control of guanine-rich DNA secondary structures depending on the protease activity using a designed PNA peptide

Organic & Biomolecular Chemistry ◽

10.1039/c4ob02535k ◽

2015 ◽

Vol 13 (7) ◽

pp. 2022-2025 ◽

Cited By ~ 9

Author(s):

Kenji Usui ◽

Arisa Okada ◽

Keita Kobayashi ◽

Naoki Sugimoto

Keyword(s):

Secondary Structure ◽

Structure Formation ◽

Protease Activity ◽

Secondary Structures ◽

Regulation System ◽

Dna Secondary Structure ◽

Secondary Structure Formation ◽

Dna Secondary Structures

A regulation system for DNA secondary structure formation of G-rich sequences using a designed PNA peptide exhibiting an enzyme-responsive functionality, depending on the protease activity was constructed.

Download Full-text

Experimental Confirmation of Multiple Co-Existent DNA Secondary Structures using Low-Yield Bisulfite Sequencing

10.1101/2021.05.21.445174 ◽

2021 ◽

Author(s):

Jiaming Li ◽

Jin Bae ◽

Boyan Yordanov ◽

Michael X Wang ◽

Javier Gonzalez ◽

...

Keyword(s):

Secondary Structure ◽

Single Molecule ◽

Dna Sequences ◽

Bisulfite Sequencing ◽

Secondary Structures ◽

Single Molecule Level ◽

Dna Oligonucleotides ◽

Metastable Structures ◽

Structure State ◽

Dna Secondary Structures

The prediction of DNA secondary structures from DNA sequences using thermodynamic models is imperfect for many biological sequences, both due to insufficient experimental data for training and to the kinetics of folding that lead to metastable structures. Here, we developed low-yield bisulfite sequencing (LYB-seq) to query the secondary structure states of cytosine (C) nucleotides in thousands of different DNA oligonucleotides on a single-molecule level. We observed that the reaction kinetics between bisulfite and C nucleotides is highly dependent on the secondary structure state of the C nucleotides, with the most accessible C nucleotides (those in small hairpin loops) reacting 70-fold faster than those in stable duplexes. Next, we developed a statistical model to evaluate the likelihood of an NGS read being consistent with a particular proposed secondary structure. By analyzing thousands of NGS reads for each DNA species, we can infer the distribution of secondary structures adopted by each species in solution. We find that 84% of 1,057 human genome subsequences studied here adopt 2 or more stable secondary structures in solution.

Download Full-text

RNAfamProb Plus NeoFold: Estimations of Posterior Probabilities on RNA Structural Alignment and RNA Secondary Structures with Incorporating Homologous-RNA Sequences

10.1101/812891 ◽

2019 ◽

Author(s):

Masaki Tagashira ◽

Kiyoshi Asai

Keyword(s):

Secondary Structure ◽

Sequence Alignment ◽

Structural Alignment ◽

Secondary Structures ◽

Simultaneous Optimization ◽

Supplementary Information ◽

Sequence Alignments ◽

Rna Sequences ◽

Link Type ◽

Rna Structural Alignment

AbstractMotivationThe simultaneous optimization of the sequence alignment and secondary structures among RNAs, structural alignment, has been required for the more appropriate comparison of functional ncRNAs than sequence alignment. Pseudo-probabilities given RNA sequences on structural alignment have been desired for more-accurate secondary structures, sequence alignments, consensus secondary structures, and structural alignments. However, any algorithms have not been proposed for these pseudo-probabilities.ResultsWe invented the RNAfamProb algorithm, an algorithm for estimating these pseudo-probabilities. We performed the application of these pseudo-probabilities to two biological problems, the visualization with these pseudo-probabilities and maximum-expected-accuracy secondary-structure (estimation). The RNAfamProb program, an implementation of this algorithm, plus the NeoFold program, a maximum-expected-accuracy secondary-structure program with these pseudo-probabilities, demonstrated prediction accuracy better than three state-of-the-art programs of maximum-expected-accuracy secondary-structure while demanding running time far longer than these three programs as expected due to the intrinsic serious problem-complexity of structural alignment compared with independent secondary structure and sequence alignment. Both the RNAfamProb and NeoFold programs estimate matters more accurately with incorporating homologous-RNA sequences.AvailabilityThe source code of each of these two programs is available on each of “https://github.com/heartsh/rnafamprob” and “https://github.com/heartsh/neofold”.Contact“[email protected]” and “[email protected]”.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

A2 Phylogenetic investigation of transmitted HIV-1 drug resistance mutations in Denmark, 2009–17

Virus Evolution ◽

10.1093/ve/vez002.001 ◽

2019 ◽

Vol 5 (Supplement_1) ◽

Author(s):

J Fonager ◽

T K Fischer

Keyword(s):

Drug Resistance ◽

Resistance Testing ◽

Transmitted Drug Resistance ◽

Newly Diagnosed ◽

Resistance Mutations ◽

Sequence Alignments ◽

Drug Resistance Mutations ◽

Subtype B ◽

Genotypic Resistance Testing ◽

Hiv 1

Abstract Transmission of HIV-1 resistance mutations among therapy-naïve patients impairs the efficiency of antiretroviral therapy (ART). Therefore, genotypic resistance testing of patients is recommended at baseline, as this both allows for the selection of the correct ART regimen and for surveillance of transmitted drug resistance mutations (TDRM) among therapy naive HIV-1 patients. In Denmark, the occurrence of TDRM in newly diagnosed and therapy naïve HIV-1 patients is monitored through the SERO project. Here, we investigated if the prevalence of TDRM differed between patients within and outside of phylogenetically identified transmission clusters. Samples from 1,227 newly diagnosed HIV-1 patients were sent along with epidemiological information to the Virological Surveillance and Research group at Statens Serum Institut. HIV-1 RNA extraction, RT-PCR and Sanger sequencing of the pol gene was performed using an in-house assay. The sequences were analyzed using BioNumerics v. 6.6 and manually checked for the presence of mixed mutations and analyzed for mutations using the HIVDB 8.4 algorithm implemented at the Stanford database. Sequence alignments were performed in Mafft, and phylogenetic analysis was performed using Mega 6.0 using the Maximum likelihood general time reversible model with 100 bootstrap replicates. Clusters were identified with ClusterPicker at default settings (cluster support = 90%, genetic distance 4.5%). Active clusters contained newly diagnosed patients from the 2015 to 2017 period. HIV-1 sequences from 588 patients belonged to one of 154 clusters, and sequences from 639 patients did not belong to a cluster. Patients in clusters were significantly more likely to be men who have sex with men and subtype B and significantly less likely to be late presenters (Fisher’s test P < 0.05). The TDRM prevalence was significantly higher for patients outside of clusters than within clusters, 16.6 per cent versus 12.1 per cent, respectively (Fisher’s test P < 0.05); however, no significant differences were found in the TDRM prevalence between the 75 active and 79 inactive clusters, nor between small (<3 patients) and large (≥3 patients) clusters. E138A, V179D, and K103N were the three most prevalent TDRMs for both patient groups, whereas M41L differed between them. In Denmark, the TDRM prevalence is lower within clusters than outside, indicating that TDRM cases are either imported and/or belong to yet unidentified clusters.

Download Full-text

Direct inference of base-pairing probabilities with neural networks improves RNA secondary structure prediction with pseudoknots

10.1101/303172 ◽

2018 ◽

Author(s):

Manato Akiyama ◽

Yasubumi Sakakibara ◽

Kengo Sato

Keyword(s):

Neural Networks ◽

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Secondary Structures ◽

Support Vector ◽

Base Pairing ◽

Rna Secondary Structures ◽

Decoding Algorithms ◽

The Neural Networks

AbstractMotivationExisting approaches for predicting RNA secondary structures depend on howto decompose a secondary structure into substructures, so-called the architecture, to define their parameter space. However, the architecture has not been sufficiently investigated especially for pseudoknotted secondary structures.ResultsIn this paper, we propose a novel algorithm to directly infer base-pairing probabilities with neural networks that does not depend on the architecture of RNA secondary structures, followed by performing the maximum expected accuracy (MEA) based decoding algorithms; Nussinov-style decoding for pseudoknot-free structures, and IPknot-style decoding for pseudoknotted structures. To train the neural networks connected to each base-pair, we adopt a max-margin framework, called structured support vector machines (SSVM), as the output layer. Our benchmarks for predicting RNA secondary structures with and without pseudoknots show that our algorithm achieves the best prediction accuracy compared with existing methods.AvailabilityThe source code is available at https://github.com/keio-bioinformatics/neuralfold/[email protected]

Download Full-text

Topoisomerase II contributes to DNA secondary structure-mediated double-stranded breaks

Nucleic Acids Research ◽

10.1093/nar/gkaa483 ◽

2020 ◽

Vol 48 (12) ◽

pp. 6654-6671

Author(s):

Karol Szlachta ◽

Arkadi Manukyan ◽

Heather M Raimer ◽

Sandeep Singh ◽

Anita Salamon ◽

...

Keyword(s):

Secondary Structure ◽

Human Genome ◽

Topoisomerase Ii ◽

Genome Instability ◽

Secondary Structures ◽

Ctcf Binding ◽

Dna Secondary Structure ◽

Disease Etiology ◽

Dna Secondary Structures ◽

Genomic Regions

Abstract DNA double-stranded breaks (DSBs) trigger human genome instability, therefore identifying what factors contribute to DSB induction is critical for our understanding of human disease etiology. Using an unbiased, genome-wide approach, we found that genomic regions with the ability to form highly stable DNA secondary structures are enriched for endogenous DSBs in human cells. Human genomic regions predicted to form non-B-form DNA induced gross chromosomal rearrangements in yeast and displayed high indel frequency in human genomes. The extent of instability in both analyses is in concordance with the structure forming ability of these regions. We also observed an enrichment of DNA secondary structure-prone sites overlapping transcription start sites (TSSs) and CCCTC-binding factor (CTCF) binding sites, and uncovered an increase in DSBs at highly stable DNA secondary structure regions, in response to etoposide, an inhibitor of topoisomerase II (TOP2) re-ligation activity. Importantly, we found that TOP2 deficiency in both yeast and human leads to a significant reduction in DSBs at structure-prone loci, and that sites of TOP2 cleavage have a greater ability to form highly stable DNA secondary structures. This study reveals a direct role for TOP2 in generating secondary structure-mediated DNA fragility, advancing our understanding of mechanisms underlying human genome instability.

Download Full-text