Genome-wide discovery of local RNA structural elements in Zika virus
In addition to encoding RNA primary structures, genomes also encode RNA secondary and tertiary structures that play roles in gene regulation and, in the case of RNA viruses, genome replication. Methods for the identification of functional RNA structures in genomes typically rely on scanning analysis windows, where multiple partially-overlapping windows are used to predict RNA structures and folding metrics to deduce regions likely to form functional structure. Separate structural models are produced for each window, where the step size can greatly affect the returned model. This makes deducing unique local structures challenging, as the same nucleotides in each window can be alternatively base paired. In the presented approach, all base pairs from all analysis windows are considered and weighted by favorable folding metrics throughout all windows. This results in unique base pairing throughout the genome and the generation of local regions/structures that can be ranked by their propensity to form unusually thermodynamically stable folds. This approach was applied to the Zika virus (ZIKV) genome. ZIKV is linked to a variety of neurological ailments including microcephaly and Guillain-Barré syndrome and its (+)-sense RNA genome encodes two, previously described, functionally essential structured RNA regions. Our approach is able to successfully identify and model the structures of these regions, while also finding additional regions likely to form functional RNA structures throughout the viral polyprotein coding region. All data for the ZIKV genome have been archived at the RNAStructuromeDB, a repository of RNA folding data for humans and their pathogens.