Ancient gene duplications in RNA viruses revealed by protein tertiary structure comparisons

Alejandro Miguel Cisneros-Martínez; Arturo Becerra; Antonio Lazcano

doi:10.1093/ve/veab019

Ancient gene duplications in RNA viruses revealed by protein tertiary structure comparisons

Virus Evolution ◽

10.1093/ve/veab019 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Alejandro Miguel Cisneros-Martínez ◽

Arturo Becerra ◽

Antonio Lazcano

Keyword(s):

Tertiary Structure ◽

Rna Viruses ◽

Rna Virus ◽

Protein Structures ◽

Data Bank ◽

Cysteine Proteases ◽

Gene Duplications ◽

Virus Family ◽

Protein Tertiary Structure ◽

Duplication Events

Abstract To date only a handful of duplicated genes have been described in RNA viruses. This shortage can be attributed to different factors, including the RNA viruses with high mutation rate that would make a large genome more prone to acquire deleterious mutations. This may explain why sequence-based approaches have only found duplications in their most recent evolutionary history. To detect earlier duplications, we performed protein tertiary structure comparisons for every RNA virus family represented in the Protein Data Bank. We present a list of thirty pairs of possible paralogs with <30 per cent sequence identity. It is argued that these pairs are the outcome of six duplication events. These include the α and β subunits of the fungal toxin KP6 present in the dsRNA Ustilago maydis virus (family Totiviridae), the SARS-CoV (Coronaviridae) nsp3 domains SUD-N, SUD-M and X-domain, the Picornavirales (families Picornaviridae, Dicistroviridae, Iflaviridae and Secoviridae) capsid proteins VP1, VP2 and VP3, and the Enterovirus (family Picornaviridae) 3C and 2A cysteine-proteases. Protein tertiary structure comparisons may reveal more duplication events as more three-dimensional protein structures are determined and suggests that, although still rare, gene duplications may be more frequent in RNA viruses than previously thought. Keywords: gene duplications; RNA viruses.

Download Full-text

Meta-transcriptomic detection of diverse and divergent RNA viruses in green and chlorarachniophyte algae

10.1101/2020.06.08.141184 ◽

2020 ◽

Author(s):

Justine Charon ◽

Vanessa Rossetto Marcelino ◽

Richard Wetherbee ◽

Heroen Verbruggen ◽

Edward C. Holmes

Keyword(s):

Rna Viruses ◽

Rna Virus ◽

Past History ◽

Protein Structures ◽

Cultivated Plants ◽

Secondary Endosymbiosis ◽

Homology Detection ◽

Virus Family ◽

Virus Diversity ◽

Eukaryotic Microorganisms

AbstractOur knowledge of the diversity and evolution of the virosphere will likely increase dramatically with the study of microbial eukaryotes, including the microalgae in few RNA viruses have been documented to date. By combining meta-transcriptomic approaches with sequence and structural-based homology detection, followed by PCR confirmation, we identified 18 novel RNA viruses in two major groups of microbial algae – the chlorophytes and the chlorarachniophytes. Most of the RNA viruses identified in the green algae class Ulvophyceae were related to those from the families Tombusviridae and Amalgaviridae that have previously been associated with plants, suggesting that these viruses have an evolutionary history that extends to when their host groups shared a common ancestor. In contrast, seven ulvophyte associated viruses exhibited clear similarity with the mitoviruses that are most commonly found in fungi. This is compatible with horizontal virus transfer between algae and fungi, although mitoviruses have recently been documented in plants. We also document, for the first time, RNA viruses in the chlorarachniophytes, including the first observation of a negative-sense (bunya-like) RNA virus in microalgae. The other virus-like sequence detected in chlorarachniophytes is distantly related to those from the plant virus family Virgaviridae, suggesting that they may have been inherited from the secondary chloroplast endosymbiosis event that marked the origin of the chlorarachniophytes. More broadly, this work suggests that the scarcity of RNA viruses in algae most likely results from limited investigation rather than their absence. Greater effort is needed to characterize the RNA viromes of unicellular eukaryotes, including through structure-based methods that are able to detect distant homologies, and with the inclusion of a wider range of eukaryotic microorganisms.Author summaryRNA viruses are expected to infect all living organisms on Earth. Despite recent developments in and the deployment of large-scale sequencing technologies, our understanding of the RNA virosphere remains anthropocentric and largely restricted to human, livestock, cultivated plants and vectors for viral disease. However, a broader investigation of the diversity of RNA viruses, especially in protists, is expected to answer fundamental questions about their origin and long-term evolution. This study first investigates the RNA virus diversity in unicellular algae taxa from the phylogenetically distinct ulvophytes and chlorarachniophytes taxa. Despite very high levels of sequence divergence, we were able to identify 18 new RNA viruses, largely related to plant and fungi viruses, and likely illustrating a past history of horizontal transfer events that have occurred during RNA virus evolution. We also hypothesise that the sequence similarity between a chlorarachniophyte-associated virga-like virus and members of Virgaviridae associated with plants may represent inheritance from a secondary endosymbiosis event. A promising approach to detect the signals of distant virus homologies through the analysis of protein structures was also utilised, enabling us to identify potential highly divergent algal RNA viruses.

Download Full-text

Structure Unveils Relationships between RNA Virus Polymerases

Viruses ◽

10.3390/v13020313 ◽

2021 ◽

Vol 13 (2) ◽

pp. 313

Author(s):

Heli A. M. Mönttinen ◽

Janne J. Ravantti ◽

Minna M. Poranen

Keyword(s):

Phylogenetic Tree ◽

Rna Viruses ◽

Rna Virus ◽

Sequence Similarity ◽

Protein Structures ◽

Structural Similarity ◽

Functional Differentiation ◽

Comparison Method ◽

Homologous Structure ◽

Biological Entities

RNA viruses are the fastest evolving known biological entities. Consequently, the sequence similarity between homologous viral proteins disappears quickly, limiting the usability of traditional sequence-based phylogenetic methods in the reconstruction of relationships and evolutionary history among RNA viruses. Protein structures, however, typically evolve more slowly than sequences, and structural similarity can still be evident, when no sequence similarity can be detected. Here, we used an automated structural comparison method, homologous structure finder, for comprehensive comparisons of viral RNA-dependent RNA polymerases (RdRps). We identified a common structural core of 231 residues for all the structurally characterized viral RdRps, covering segmented and non-segmented negative-sense, positive-sense, and double-stranded RNA viruses infecting both prokaryotic and eukaryotic hosts. The grouping and branching of the viral RdRps in the structure-based phylogenetic tree follow their functional differentiation. The RdRps using protein primer, RNA primer, or self-priming mechanisms have evolved independently of each other, and the RdRps cluster into two large branches based on the used transcription mechanism. The structure-based distance tree presented here follows the recently established RdRp-based RNA virus classification at genus, subfamily, family, order, class and subphylum ranks. However, the topology of our phylogenetic tree suggests an alternative phylum level organization.

Download Full-text

Doubling of the known set of RNA viruses by metagenomic analysis of an aquatic virome

Nature Microbiology ◽

10.1038/s41564-020-0755-4 ◽

2020 ◽

Vol 5 (10) ◽

pp. 1262-1270 ◽

Cited By ~ 6

Author(s):

Yuri I. Wolf ◽

Sukrit Silas ◽

Yongjie Wang ◽

Shuang Wu ◽

Michael Bocek ◽

...

Keyword(s):

Rna Viruses ◽

Rna Virus ◽

River Estuary ◽

Yangtze River Estuary ◽

Phylogenomic Analysis ◽

Virus Family ◽

Genetic Codes ◽

Sister Clade ◽

The Yangtze River Estuary ◽

Capping Enzyme

Abstract RNA viruses in aquatic environments remain poorly studied. Here, we analysed the RNA virome from approximately 10 l water from Yangshan Deep-Water Harbour near the Yangtze River estuary in China and identified more than 4,500 distinct RNA viruses, doubling the previously known set of viruses. Phylogenomic analysis identified several major lineages, roughly, at the taxonomic ranks of class, order and family. The 719-member-strong Yangshan virus assemblage is the sister clade to the expansive class Alsuviricetes and consists of viruses with simple genomes that typically encode only RNA-dependent RNA polymerase (RdRP), capping enzyme and capsid protein. Several clades within the Yangshan assemblage independently evolved domain permutation in the RdRP. Another previously unknown clade shares ancestry with Potyviridae, the largest known plant virus family. The ‘Aquatic picorna-like viruses/Marnaviridae’ clade was greatly expanded, with more than 800 added viruses. Several RdRP-linked protein domains not previously detected in any RNA viruses were identified, such as the small ubiquitin-like modifier (SUMO) domain, phospholipase A2 and PrsW-family protease domain. Multiple viruses utilize alternative genetic codes implying protist (especially ciliate) hosts. The results reveal a vast RNA virome that includes many previously unknown groups. However, phylogenetic analysis of the RdRPs supports the previously established five-branch structure of the RNA virus evolutionary tree, with no additional phyla.

Download Full-text

Antiviral Drug Targets of Single-Stranded RNA Viruses Causing Chronic Human Diseases

Current Drug Targets ◽

10.2174/1389450119666190920153247 ◽

2020 ◽

Vol 21 (2) ◽

pp. 105-124 ◽

Cited By ~ 6

Author(s):

Dhurvas Chandrasekaran Dinesh ◽

Selvaraj Tamilarasan ◽

Kaushik Rajaram ◽

Evžen Bouřa

Keyword(s):

Drug Targets ◽

Viral Infections ◽

Rna Viruses ◽

Antiviral Agents ◽

Protein Structures ◽

Antiviral Drug ◽

Three Dimensional ◽

Data Bank ◽

Chronic Infections ◽

Immunodeficiency Syndrome

Ribonucleic acid (RNA) viruses associated with chronic diseases in humans are major threats to public health causing high mortality globally. The high mutation rate of RNA viruses helps them to escape the immune response and also is responsible for the development of drug resistance. Chronic infections caused by human immunodeficiency virus (HIV) and hepatitis viruses (HBV and HCV) lead to acquired immunodeficiency syndrome (AIDS) and hepatocellular carcinoma respectively, which are one of the major causes of human deaths. Effective preventative measures to limit chronic and re-emerging viral infections are absolutely necessary. Each class of antiviral agents targets a specific stage in the viral life cycle and inhibits them from its development and proliferation. Most often, antiviral drugs target a specific viral protein, therefore only a few broad-spectrum drugs are available. This review will be focused on the selected viral target proteins of pathogenic viruses containing single-stranded (ss) RNA genome that causes chronic infections in humans (e.g. HIV, HCV, Flaviviruses). In the recent past, an exponential increase in the number of available three-dimensional protein structures (>150000 in Protein Data Bank), allowed us to better understand the molecular mechanism of action of protein targets and antivirals. Advancements in the in silico approaches paved the way to design and develop several novels, highly specific small-molecule inhibitors targeting the viral proteins.

Download Full-text

PRIGSA2: Improved version of Protein Repeat Identification by Graph Spectral Analysis

10.1101/803304 ◽

2019 ◽

Author(s):

Broto Chakrabarty ◽

Nita Parekh

Keyword(s):

Tertiary Structure ◽

De Novo ◽

Protein Complexes ◽

Repeat Unit ◽

Protein Structures ◽

Fold Increase ◽

Data Bank ◽

Topological Features ◽

Repeat Proteins ◽

Complete Protein

AbstractTandemly repeated structural motifs in proteins form highly stable structural folds and provide multiple binding sites associated with diverse functional roles. The tertiary structure and function of these proteins are determined by the type and copy number of the repeating units. Each repeat type exhibits a unique pattern of intra- and inter-repeat unit interactions that is well-captured by the topological features in the network representation of protein structures. Here we present an improved version of our graph based algorithm, PRIGSA, with structure-based validation and filtering steps incorporated for accurate detection of tandem structural repeats. The algorithm integrates available knowledge on repeat families with de novo prediction to detect repeats in single monomer chains as well as in multimeric protein complexes. Three levels of performance evaluation are presented: comparison with state-of-the-art algorithms on benchmark dataset of repeat and non-repeat proteins, accuracy in the detection of members of 13 known repeat families reported in UniProt and execution on the complete Protein Data Bank to show its ability to identify previously uncharacterized proteins. A ∼3-fold increase in the coverage of the members of 13 known families and 3,408 novel uncharacterized structural repeat proteins are identified on executing it on PDB. URL: http://bioinf.iiit.ac.in/PRIGSA2/.

Download Full-text

Assessment of Globularity of Protein Structures via Minimum Volume Ellipsoids and Voxel-Based Atom Representation

Crystals ◽

10.3390/cryst11121539 ◽

2021 ◽

Vol 11 (12) ◽

pp. 1539

Author(s):

Mateusz Banach

Keyword(s):

Tertiary Structure ◽

Protein Structures ◽

Kernel Density ◽

Data Bank ◽

Molecular Surface ◽

Uniform Grid ◽

Minimum Volume ◽

Minimum Volume Ellipsoid ◽

Preparation Step ◽

Structure Of Proteins

A computer algorithm for assessment of globularity of protein structures is presented. By enclosing the input protein in a minimum volume ellipsoid (MVEE) and calculating a profile measuring how voxelized space within this shape (cubes on a uniform grid) is occupied by atoms, it is possible to estimate how well the molecule resembles a globule. For any protein to satisfy the proposed globularity criterion, its ellipsoid profile (EP) should first confirm that atoms adequately fill the ellipsoid’s center. This property should then propagate towards the surface of the ellipsoid, although with diminishing importance. It is not required to compute the molecular surface. Globular status (full or partial) is assigned to proteins with values of their ellipsoid profiles, called here the ellipsoid indexes (EI), above certain levels. Due to structural outliers which may considerably distort the measurements, a companion method for their detection and reduction of their influence is also introduced. It is based on kernel density estimation and is shown to work well as an optional input preparation step for MVEE. Finally, the complete workflow is applied to over two thousand representatives of SCOP 2.08 domain superfamilies, surveying the landscape of tertiary structure of proteins from the Protein Data Bank.

Download Full-text

Predicting the stability of homologous gene duplications in a plant RNA virus

10.1101/060517 ◽

2016 ◽

Author(s):

Anouk Willemsen ◽

Mark P. Zwart ◽

Pablo Higueras ◽

Josep Sardanyés ◽

Santiago F. Elena

Keyword(s):

Gene Duplication ◽

Rna Viruses ◽

Rna Virus ◽

Genomic Stability ◽

Gene Copy ◽

Homologous Gene ◽

Gene Duplications ◽

Evolutionary Potential ◽

Rate Dependent ◽

The Stability

AbstractOne of the striking features of many eukaryotes is the apparent amount of redundancy in coding and non-coding elements of their genomes. Despite the possible evolutionary advantages, there are fewer examples of redundant sequences in viral genomes, particularly those with RNA genomes. The low prevalence of gene duplication in RNA viruses most likely reflects the strong selective constraints against increasing genome size. Here we investigated the stability of genetically redundant sequences and how adaptive evolution proceeds to remove them. We generated plant RNA viruses with potentially beneficial gene duplications, measured their fitness and performed experimental evolution, hereby exploring their genomic stability and evolutionary potential. We found that all gene duplication events resulted in a loss of viability or significant reductions in fitness. Moreover, upon evolving the viable viruses and analyzing their genomes, we always observed the deletion of the duplicated gene copy and maintenance of the ancestral copy. Interestingly, there were clear differences in the deletion dynamics of the duplicated gene associated with the passage duration, the size of the gene and the position for duplication. Based on the experimental data, we developed a mathematical model to characterize the stability of genetically redundant sequences, and showed that the fitness of viruses with duplications is not enough information to predict genomic stability as a recombination rate dependent on the genetic context – the duplicated gene and its position – is also required. Our results therefore demonstrate experimentally the deleterious nature of gene duplications in RNA viruses, and we identify factors that constrain the maintenance of duplicated genes.

Download Full-text

A COMPARATIVE STUDY OF PROTEIN TERTIARY STRUCTURE PREDICTION METHODS

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2014.1168 ◽

2014 ◽

pp. 15-18

Author(s):

CHANDRAYANI N. ROKDE ◽

DR.MANALI KSHIRSAGAR

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Sequence Data ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

X Ray Crystallography ◽

Protein Tertiary Structure Prediction

Protein structure prediction (PSP) from amino acid sequence is one of the high focus problems in bioinformatics today. This is due to the fact that the biological function of the protein is determined by its three dimensional structure. The understanding of protein structures is vital to determine the function of a protein and its interaction with DNA, RNA and enzyme. Thus, protein structure is a fundamental area of computational biology. Its importance is intensed by large amounts of sequence data coming from PDB (Protein Data Bank) and the fact that experimentally methods such as X-ray crystallography or Nuclear Magnetic Resonance (NMR)which are used to determining protein structures remains very expensive and time consuming. In this paper, different types of protein structures and methods for its prediction are described.

Download Full-text

DeepPSC (protein structure camera): computer vision-based protein backbone structure reconstruction from alpha carbon trace as a case study

10.1101/2020.08.12.247312 ◽

2020 ◽

Author(s):

Xing Zhang ◽

Junwen Luo ◽

Yi Cai ◽

Wei Zhu ◽

Xiaofeng Yang ◽

...

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Tertiary Structure ◽

Protein Structures ◽

Protein Backbone ◽

Protein Tertiary Structure ◽

Structure Information ◽

Alpha Carbon

AbstractDeep learning has been increasingly used in protein tertiary structure prediction, a major goal in life science. However, all the algorithms developed so far mostly use protein sequences as input, whereas the vast amount of protein tertiary structure information available in the Protein Data Bank (PDB) database remains largely unused, because of the inherent complexity of 3D data computation. In this study, we propose Protein Structure Camera (PSC) as an approach to convert protein structures into images. As a case study, we developed a deep learning method incorporating PSC (DeepPSC) to reconstruct protein backbone structures from alpha carbon traces. DeepPSC outperformed all the methods currently available for this task. This PSC approach provides a useful tool for protein structure representation, and for the application of deep learning in protein structure prediction and protein engineering.

Download Full-text

Structure and stability of theHuman respiratory syncytial virusM2–1RNA-binding core domain reveals a compact and cooperative folding unit

Acta Crystallographica Section F Structural Biology Communications ◽

10.1107/s2053230x17017381 ◽

2017 ◽

Vol 74 (1) ◽

pp. 23-30 ◽

Cited By ~ 4

Author(s):

Ivana G. Molina ◽

Inokentijs Josts ◽

Yasser Almeida Hernandez ◽

Sebastian Esperante ◽

Mariano Salgueiro ◽

...

Keyword(s):

Rna Binding ◽

Tertiary Structure ◽

Rna Virus ◽

Binding Activity ◽

Virus Family ◽

Core Domain ◽

New Family ◽

Binding Core ◽

And Function ◽

Secondary And Tertiary Structure

Human syncytial respiratory virusis a nonsegmented negative-strand RNA virus with serious implications for respiratory disease in infants, and has recently been reclassified into a new family,Pneumoviridae. One of the main reasons for this classification is the unique presence of a transcriptional antiterminator, called M2–1. The puzzling mechanism of action of M2–1, which is a rarity among antiterminators in viruses and is part of the RNA polymerase complex, relies on dissecting the structure and function of this multidomain tetramer. The RNA-binding activity is located in a monomeric globular `core' domain, a high-resolution crystal structure of which is now presented. The structure reveals a compact domain which is superimposable on the full-length M2–1tetramer, with additional electron density for the C-terminal tail that was not observed in the previous models. Moreover, its folding stability was determined through chemical denaturation, which shows that the secondary and tertiary structure unfold concomitantly, which is indicative of a two-state equilibrium. These results constitute a further step in the understanding of this unique RNA-binding domain, for which there is no sequence or structural counterpart outside this virus family, in addition to its implications in transcription regulation and its likeliness as an antiviral target.

Download Full-text