ScanFold: an approach for genome-wide discovery of local RNA structural elements—applications to Zika virus and HIV

PeerJ ◽

10.7717/peerj.6136 ◽

2018 ◽

Vol 6 ◽

pp. e6136 ◽

Cited By ~ 15

Author(s):

Ryan J. Andrews ◽

Julien Roche ◽

Walter N. Moss

Keyword(s):

Zika Virus ◽

Genome Replication ◽

Rna Structures ◽

Step Size ◽

Base Pairs ◽

Rna Motifs ◽

Tertiary Structures ◽

Local Structures ◽

Genome Wide ◽

Functional Rna

In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary structures that play roles in gene regulation and, in the case of RNA viruses, genome replication. Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis windows, where multiple partially-overlapping windows are used to predict RNA structures and folding metrics to deduce regions likely to form functional structure. Separate structural models are produced for each window, where the step size can greatly affect the returned model. This makes deducing unique local structures challenging, as the same nucleotides in each window can be alternatively base paired. We are presenting here a new approach where all base pairs from analysis windows are considered and weighted by favorable folding. This results in unique base pairing throughout the genome and the generation of local regions/structures that can be ranked by their propensity to form unusually thermodynamically stable folds. We applied this approach to the Zika virus (ZIKV) and HIV-1 genomes. ZIKV is linked to a variety of neurological ailments including microcephaly and Guillain–Barré syndrome and its (+)-sense RNA genome encodes two, previously described, functionally essential structured RNA regions. HIV, the cause of AIDS, contains multiple functional RNA motifs in its genome, which have been extensively studied. Our approach is able to successfully identify and model the structures of known functional motifs in both viruses, while also finding additional regions likely to form functional structures. All data have been archived at the RNAStructuromeDB (www.structurome.bb.iastate.edu), a repository of RNA folding data for humans and their pathogens.

Download Full-text

Genome-wide discovery of local RNA structural elements in Zika virus

10.7287/peerj.preprints.27101 ◽

2018 ◽

Author(s):

Ryan J Andrews ◽

Julien Roche ◽

Walter N Moss

Keyword(s):

Zika Virus ◽

Genome Replication ◽

Rna Structures ◽

Coding Region ◽

Step Size ◽

Base Pairs ◽

Local Structures ◽

Genome Wide ◽

Viral Polyprotein ◽

Functional Rna

In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary structures that play roles in gene regulation and, in the case of RNA viruses, genome replication. Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis windows, where multiple partially-overlapping windows are used to predict RNA structures and folding metrics to deduce regions likely to form functional structure. Separate structural models are produced for each window, where the step size can greatly affect the returned model. This makes deducing unique local structures challenging, as the same nucleotides in each window can be alternatively base paired. In the presented approach, all base pairs from all analysis windows are considered and weighted by favorable folding metrics throughout all windows. This results in unique base pairing throughout the genome and the generation of local regions/structures that can be ranked by their propensity to form unusually thermodynamically stable folds. This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes two, previously described, functionally essential structured RNA regions. Our approach is able to successfully identify and model the structures of these regions, while also finding additional regions likely to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.

Download Full-text

Genome-wide discovery of local RNA structural elements in Zika virus

10.7287/peerj.preprints.27101v1 ◽

2018 ◽

Author(s):

Ryan J Andrews ◽

Julien Roche ◽

Walter N Moss

Keyword(s):

Zika Virus ◽

Genome Replication ◽

Rna Structures ◽

Coding Region ◽

Step Size ◽

Base Pairs ◽

Local Structures ◽

Genome Wide ◽

Viral Polyprotein ◽

Functional Rna

In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary structures that play roles in gene regulation and, in the case of RNA viruses, genome replication. Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis windows, where multiple partially-overlapping windows are used to predict RNA structures and folding metrics to deduce regions likely to form functional structure. Separate structural models are produced for each window, where the step size can greatly affect the returned model. This makes deducing unique local structures challenging, as the same nucleotides in each window can be alternatively base paired. In the presented approach, all base pairs from all analysis windows are considered and weighted by favorable folding metrics throughout all windows. This results in unique base pairing throughout the genome and the generation of local regions/structures that can be ranked by their propensity to form unusually thermodynamically stable folds. This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes two, previously described, functionally essential structured RNA regions. Our approach is able to successfully identify and model the structures of these regions, while also finding additional regions likely to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.

Download Full-text

Comparative analysis of protein evolution and RNA structural changes in the genome of pre-epidemic and epidemic Zika virus

10.1101/050278 ◽

2016 ◽

Author(s):

Arunachalam Ramaiah ◽

Lei Dai ◽

Deisy Contreras ◽

Sanjeev Sinha ◽

Ren Sun ◽

...

Keyword(s):

Virus Replication ◽

Protein Evolution ◽

Zika Virus ◽

Structural Changes ◽

Yellow Fever Virus ◽

Rna Structures ◽

Stem Loop ◽

Human Host ◽

Genome Wide ◽

Poor Pregnancy Outcome

ABSTRACTZika virus (ZIKV) infection is associated with microcephaly, neurological disorders and poor pregnancy outcome1-3and no vaccine is available. Although ZIKV was first discovered in 1947, the exact mechanism of virus replication and pathogenesis still remains unknown. Recent outbreaks of Zika virus in the Americas clearly suggest a better adaptation of viral strains to human host. Understanding the conserved and adaptive features in the evolution of ZIKV genome will reveal the molecular mechanism of virus replication and host adaptation. Here, we show comprehensive analysis of protein evolution and changes in RNA secondary structures of ZIKV strains including the current 2015-16 outbreak. To identify the constraints on ZIKV evolution, selection pressure at individual codons, immune epitopes, co-evolving sites, and RNA structures were analyzed. The proteome of current 2015/16 epidemic ZIKV strains of Asian genotype is found to be genetically conserved due to genome-wide negative selection on codons, with limited positive selection. Predicted RNA structures at the 5’ and 3’ ends of ZIKV strains reveal substantial changes such as an additional stem loop which makes it similar to that of Yellow Fever Virus. Concisely, the targeted changes at both the amino acid and the RNA levels contribute to the better adaptation of ZIKV strains to human host with an enhanced neurotropism.

Download Full-text

Conserved long-range base pairings are associated with pre-mRNA processing of human genes

Nature Communications ◽

10.1038/s41467-021-22549-7 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Svetlana Kalmykova ◽

Marina Kalinina ◽

Stepan Denisov ◽

Alexey Mironov ◽

Dmitry Skvortsov ◽

...

Keyword(s):

Long Range ◽

Rna Folding ◽

Current Knowledge ◽

Rna Structures ◽

Base Pairs ◽

Protein Coding ◽

Proximity Ligation ◽

Transcriptional Suppression ◽

Human Genes ◽

Cleavage And Polyadenylation

AbstractThe ability of nucleic acids to form double-stranded structures is essential for all living systems on Earth. Current knowledge on functional RNA structures is focused on locally-occurring base pairs. However, crosslinking and proximity ligation experiments demonstrated that long-range RNA structures are highly abundant. Here, we present the most complete to-date catalog of conserved complementary regions (PCCRs) in human protein-coding genes. PCCRs tend to occur within introns, suppress intervening exons, and obstruct cryptic and inactive splice sites. Double-stranded structure of PCCRs is supported by decreased icSHAPE nucleotide accessibility, high abundance of RNA editing sites, and frequent occurrence of forked eCLIP peaks. Introns with PCCRs show a distinct splicing pattern in response to RNAPII slowdown suggesting that splicing is widely affected by co-transcriptional RNA folding. The enrichment of 3’-ends within PCCRs raises the intriguing hypothesis that coupling between RNA folding and splicing could mediate co-transcriptional suppression of premature pre-mRNA cleavage and polyadenylation.

Download Full-text

Characterization of a Novel Thermobifida fusca Bacteriophage P318

Viruses ◽

10.3390/v11111042 ◽

2019 ◽

Vol 11 (11) ◽

pp. 1042

Author(s):

Cheepudom ◽

Lin ◽

Lee ◽

Meng

Keyword(s):

Plant Cell Wall ◽

Hydrolytic Enzymes ◽

Thermobifida Fusca ◽

Genome Replication ◽

Putative Orfs ◽

Base Pairs ◽

Double Stranded Dna ◽

Virion Morphogenesis ◽

Genome Information

Thermobifida fusca is of biotechnological interest due to its ability to produce an array of plant cell wall hydrolytic enzymes. Nonetheless, only one T. fusca bacteriophage with genome information has been reported to date. This study was aimed at discovering more relevant bacteriophages to expand the existing knowledge of phage diversity for this host species. With this end in view, a thermostable T. fusca bacteriophage P318, which belongs to the Siphoviridae family, was isolated and characterized. P318 has a double-stranded DNA genome of 48,045 base pairs with 3′-extended COS ends, on which 52 putative ORFs are organized into clusters responsible for the order of genome replication, virion morphogenesis, and the regulation of the lytic/lysogenic cycle. In comparison with T. fusca and the previously discovered bacteriophage P1312, P318 has a much lower G+C content in its genome except at the region encompassing ORF42, which produced a protein with unknown function. P1312 and P318 share very few similarities in their genomes except for the regions encompassing ORF42 of P318 and ORF51 of P1312 that are homologous. Thus, acquisition of ORF42 by lateral gene transfer might be an important step in the evolution of P318.

Download Full-text

Modeling cell-free DNA fragment size densities for non-invasive detection of cancer.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.3058 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. 3058-3058

Author(s):

Jacob Carey ◽

Bryan Chesnick ◽

Denise Butler ◽

Michael Rongione ◽

Giovanni Parmigiani ◽

...

Keyword(s):

Fragment Size ◽

Length Distribution ◽

Mixture Component ◽

Base Pairs ◽

Machine Model ◽

Cell Free Dna ◽

Non Invasive ◽

Free Dna ◽

Genome Wide ◽

Low Coverage

3058 Background: Circulating cell-free DNA (cfDNA) is largely nucleosomal in origin with typical fragment lengths of 167 base-pairs reflecting the length of DNA wrapped around-the histone and H1 linker. Given the nucleosomal origin of cfDNA, we have previously used low coverage whole genome sequencing to evaluate DNA fragmentation profiles to sensitively and specifically detect tumor-derived DNA with altered fragment lengths or coverage. Methods: Here we evaluate the use of Bayesian finite mixtures to model the fragment length distribution and demonstrate how the parameters from these models can be useful to distinguish between individuals with and without cancer. We examined the number of cfDNA fragments by size ranging from 100-220bp and approximated the mixture component location, scale, and weight using Markov Chain Monte Carlo. The performance of the method was determined using a ten-fold, ten repeat cross-validation of Gradient Boosted Machine model using 1) our previously described genome-wide fragmentation profile approach, 2) the parameters from the mixture model and 3) a combination of approaches 1) and 2) as features. Results: In this study of 215 cancer patients and 208 cancer-free individuals, we observed cross-validated AUCs of 1) 0.94, 2) 0.95, and 3) 0.97 among the three approaches. Conclusions: Our findings indicate that parsimonious mixture models may improve detection of cancer in conjunction with fragmentation profile analyses across the genome.

Download Full-text

The structure of an RNA dodecamer shows how tandem U–U base pairs increase the range of stable RNA structures and the diversity of recognition sites

Structure ◽

10.1016/s0969-2126(96)00099-8 ◽

1996 ◽

Vol 4 (8) ◽

pp. 917-930 ◽

Cited By ~ 54

Author(s):

Susan E Lietzke ◽

Cindy L Barnes ◽

J Andrew Berglund ◽

Craig E Kundrot

Keyword(s):

Rna Structures ◽

Base Pairs ◽

Recognition Sites

Download Full-text

An NMR-based approach reveals the core structure of the functional domain of SINEUP lncRNAs

Nucleic Acids Research ◽

10.1093/nar/gkaa598 ◽

2020 ◽

Vol 48 (16) ◽

pp. 9346-9360

Author(s):

Takako Ohyama ◽

Hazuki Takahashi ◽

Harshita Sharma ◽

Toshio Yamazaki ◽

Stefano Gustincich ◽

...

Keyword(s):

Nuclear Magnetic Resonance ◽

Computational Prediction ◽

Functional Domain ◽

Rna Structures ◽

Tertiary Structures ◽

Functional Roles ◽

The Core ◽

Non Coding Rnas ◽

Dynamic Domain

Abstract Long non-coding RNAs (lncRNAs) are attracting widespread attention for their emerging regulatory, transcriptional, epigenetic, structural and various other functions. Comprehensive transcriptome analysis has revealed that retrotransposon elements (REs) are transcribed and enriched in lncRNA sequences. However, the functions of lncRNAs and the molecular roles of the embedded REs are largely unknown. The secondary and tertiary structures of lncRNAs and their embedded REs are likely to have essential functional roles, but experimental determination and reliable computational prediction of large RNA structures have been extremely challenging. We report here the nuclear magnetic resonance (NMR)-based secondary structure determination of the 167-nt inverted short interspersed nuclear element (SINE) B2, which is embedded in antisense Uchl1 lncRNA and upregulates the translation of sense Uchl1 mRNAs. By using NMR ‘fingerprints’ as a sensitive probe in the domain survey, we successfully divided the full-length inverted SINE B2 into minimal units made of two discrete structured domains and one dynamic domain without altering their original structures after careful boundary adjustments. This approach allowed us to identify a structured domain in nucleotides 31–119 of the inverted SINE B2. This approach will be applicable to determining the structures of other regulatory lncRNAs.

Download Full-text

Qtlizer: comprehensive QTL annotation of GWAS results

Scientific Reports ◽

10.1038/s41598-020-75770-7 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Matthias Munz ◽

Inken Wohlers ◽

Eric Simon ◽

Tobias Reinberger ◽

Hauke Busch ◽

...

Keyword(s):

Association Studies ◽

Housekeeping Genes ◽

R Package ◽

Genome Wide Association Studies ◽

Protein Abundance ◽

Base Pairs ◽

Link Type ◽

Genome Wide ◽

Wide Range ◽

Distance Limit

AbstractExploration of genetic variant-to-gene relationships by quantitative trait loci such as expression QTLs is a frequently used tool in genome-wide association studies. However, the wide range of public QTL databases and the lack of batch annotation features complicate a comprehensive annotation of GWAS results. In this work, we introduce the tool “Qtlizer” for annotating lists of variants in human with associated changes in gene expression and protein abundance using an integrated database of published QTLs. Features include incorporation of variants in linkage disequilibrium and reverse search by gene names. Analyzing the database for base pair distances between best significant eQTLs and their affected genes suggests that the commonly used cis-distance limit of 1,000,000 base pairs might be too restrictive, implicating a substantial amount of wrongly and yet undetected eQTLs. We also ranked genes with respect to the maximum number of tissue-specific eQTL studies in which a most significant eQTL signal was consistent. For the top 100 genes we observed the strongest enrichment with housekeeping genes (P = 2 × 10–6) and with the 10% highest expressed genes (P = 0.005) after grouping eQTLs by r2 > 0.95, underlining the relevance of LD information in eQTL analyses. Qtlizer can be accessed via https://genehopper.de/qtlizer or by using the respective Bioconductor R-package (https://doi.org/10.18129/B9.bioc.Qtlizer).

Download Full-text

Real-time audio and visual display of the Coronavirus genome

BMC Bioinformatics ◽

10.1186/s12859-020-03760-7 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Mark D. Temple

Keyword(s):

Real Time ◽

Large Body ◽

Visual Display ◽

Viral Rna ◽

Regulatory Sequences ◽

Auditory Display ◽

Rna Motifs ◽

Rna Sequence ◽

Rna Genome ◽

Functional Rna

Abstract Background This paper describes a web based tool that uses a combination of sonification and an animated display to inquire into the SARS-CoV-2 genome. The audio data is generated in real time from a variety of RNA motifs that are known to be important in the functioning of RNA. Additionally, metadata relating to RNA translation and transcription has been used to shape the auditory and visual displays. Together these tools provide a unique approach to further understand the metabolism of the viral RNA genome. This audio provides a further means to represent the function of the RNA in addition to traditional written and visual approaches. Results Sonification of the SARS-CoV-2 genomic RNA sequence results in a complex auditory stream composed of up to 12 individual audio tracks. Each auditory motive is derived from the actual RNA sequence or from metadata. This approach has been used to represent transcription or translation of the viral RNA genome. The display highlights the real-time interaction of functional RNA elements. The sonification of codons derived from all three reading frames of the viral RNA sequence in combination with sonified metadata provide the framework for this display. Functional RNA motifs such as transcription regulatory sequences and stem loop regions have also been sonified. Using the tool, audio can be generated in real-time from either genomic or sub-genomic representations of the RNA. Given the large size of the viral genome, a collection of interactive buttons has been provided to navigate to regions of interest, such as cleavage regions in the polyprotein, untranslated regions or each gene. These tools are available through an internet browser and the user can interact with the data display in real time. Conclusion The auditory display in combination with real-time animation of the process of translation and transcription provide a unique insight into the large body of evidence describing the metabolism of the RNA genome. Furthermore, the tool has been used as an algorithmic based audio generator. These audio tracks can be listened to by the general community without reference to the visual display to encourage further inquiry into the science.

Download Full-text