scholarly journals Hybrid sequence-structure based HMM models leverage the identification of homologous proteins: the example of class II fusion proteins

2018 ◽  
Author(s):  
R. Tetley ◽  
P. Guardado-Calvo ◽  
J. Fedry ◽  
F. Rey ◽  
F. Cazals

AbstractWe present a sequence-structure based method characterizing a set of functionally related proteins exhibiting low sequence identity and loose structural conservation. Given a (small) set of structures, our method consists of three main steps. First, pairwise structural alignments are combined with multi-scale geometric analysis to produce structural motifs i.e. regions structurally more conserved than the whole structures. Second, the sub-sequences of the motifs are used to build profile hidden Markov models (HMM) biased towards the structurally conserved regions. Third, these HMM are used to retrieve from UniProtKB proteins harboring signatures compatible with the function studied, in a bootstrap fashion.We apply these hybrid HMM to investigate two questions related to class II fusion proteins, an especially challenging class since known structures exhibit low sequence identity (less than 15%) and loose structural similarity (of the order of 15Å in lRMSD). In a first step, we compare the performances of our hybrid HMM against those of sequence based HMM. Using various learning sets, we show that both classes of HMM retrieve unique species. The number of unique species reported by both classes of methods are comparable, stressing the novelty brought by our hybrid models. In a second step, we use our models to identify 17 plausible HAP2-GSC1 candidate sequences in 10 different drosophila melanogaster species. These models are not identified by the PFÅM family HAP2-GCS1 (PF10699), stressing the ability of our structural motifs to capture signals more subtle than whole Pfam domains.In a more general setting, our method should be of interest for all cases functional families with low sequence identity and loose structural conservation.Our software tools are available from the FunChaT package of the Structural Bioinformatics Library (http://sbl.inria.fr).

2021 ◽  
Author(s):  
Margarita V. Rangel ◽  
Nicholas Catanzaro ◽  
Sara A. Thannickal ◽  
Kelly A. Crotty ◽  
Maria G. Noval ◽  
...  

Alphaviruses and flaviviruses have class II fusion glycoproteins that are essential for virion assembly and infectivity. Importantly, the tip of domain II is structurally conserved between the alphavirus and flavivirus fusion proteins, yet whether these structural similarities between virus families translate to functional similarities is unclear. Using in vivo evolution of Zika virus (ZIKV), we identified several novel emerging variants including an envelope glycoprotein variant in β-strand c (V114M) of domain II. We have previously shown that the analogous β-strand c and the ij loop, located in the tip of domain II of the alphavirus E1 glycoprotein, are important for infectivity. This led us to hypothesize that flavivirus E β-strand c also contributes to flavivirus infection. We generated this ZIKV glycoprotein variant and found that while it had little impact on infection in mosquitoes, it reduced replication in human cells and mice, and increased virus sensitivity to ammonium chloride, as seen for alphaviruses. In light of these results and given our alphavirus ij loop studies, we mutated a conserved alanine at the tip of the flavivirus ij loop to valine to test its effect on ZIKV infectivity. Interestingly, this mutation inhibited infectious virion production of ZIKV and yellow fever virus, but not West Nile virus. Together, these studies show that shared domains of the alphavirus and flavivirus class II fusion glycoproteins harbor structurally analogous residues that are functionally important and contribute to virus infection in vivo. Importance Arboviruses are a significant global public health threat, yet there are no antivirals targeting these viruses. This problem is in part due to our lack of knowledge on the molecular mechanisms involved in the arbovirus life cycle. In particular, virus entry and assembly are essential processes in the virus life cycle and steps that can be targeted for the development of antiviral therapies. Therefore, understanding common, fundamental mechanisms used by different arboviruses for entry and assembly is essential. In this study, we show that flavivirus and alphavirus residues located in structurally conserved and analogous regions of the class II fusion proteins contribute to common mechanisms of entry, dissemination, and infectious virion production. These studies highlight how class II fusion proteins function and provide novel targets for development of antivirals.


2008 ◽  
Vol 82 (18) ◽  
pp. 9245-9253 ◽  
Author(s):  
M. Umashankar ◽  
Claudia Sánchez-San Martín ◽  
Maofu Liao ◽  
Brigid Reilly ◽  
Alice Guo ◽  
...  

ABSTRACT The class II fusion proteins of the alphaviruses and flaviviruses mediate virus infection by driving the fusion of the virus membrane with that of the cell. These fusion proteins are triggered by low pH, and their structures are strikingly similar in both the prefusion dimer and the postfusion homotrimer conformations. Here we have compared cholesterol interactions during membrane fusion by these two groups of viruses. Using cholesterol-depleted insect cells, we showed that fusion and infection by the alphaviruses Semliki Forest virus (SFV) and Sindbis virus were strongly promoted by cholesterol, with similar sterol dependence in laboratory and field isolates and in viruses passaged in tissue culture. The E1 fusion protein from SFV bound cholesterol, as detected by labeling with photocholesterol and by cholesterol extraction studies. In contrast, fusion and infection by numerous strains of the flavivirus dengue virus (DV) and by yellow fever virus 17D were cholesterol independent, and the DV fusion protein did not show significant cholesterol binding. SFV E1 is the first virus fusion protein demonstrated to directly bind cholesterol. Taken together, our results reveal important functional differences conferred by the cholesterol-binding properties of class II fusion proteins.


Virology ◽  
2007 ◽  
Vol 368 (1) ◽  
pp. 102-113 ◽  
Author(s):  
Marija Backovic ◽  
George P. Leser ◽  
Robert A. Lamb ◽  
Richard Longnecker ◽  
Theodore S. Jardetzky

2010 ◽  
Vol 285 (22) ◽  
pp. 16403-16407 ◽  
Author(s):  
Susan Lowey ◽  
Kathleen M. Trybus
Keyword(s):  
Class Ii ◽  

2018 ◽  
Author(s):  
Mathilde Carpentier ◽  
Jacques Chomilier

ABSTRACTFacing the huge increase of information about proteins, classification has reached the level of a compulsory task, essential for assigning a function to a given sequence, by means of comparison to existing data. Multiple sequence alignment programs have been proven to be very useful and they have already been evaluated. In this paper we wished to evaluate the added value provided by taking into account structures. We compared the multiple alignments resulting from 24 programs, either based on sequence, structure, or both, to reference alignments deposited in five databases. Reference databases, on their side, can be split in two: more automatic ones, and more manually ones. Scores have been attributed to each program. As a global rule of thumb, five groups of methods emerge, with the lead to two of the structure-based programs. This advantage is increased at low levels of sequence identity among aligned proteins, or for residues in regular secondary structures or buried. Concerning gap management, sequence-based programs place less gaps than structure-based programs. Concerning the databases, the alignments from the manually built databases are the more challenging for the programs.


2019 ◽  
Vol 35 (20) ◽  
pp. 3970-3980 ◽  
Author(s):  
Mathilde Carpentier ◽  
Jacques Chomilier

Abstract Motivation Multiple sequence alignment programs have proved to be very useful and have already been evaluated in the literature yet not alignment programs based on structure or both sequence and structure. In the present article we wish to evaluate the added value provided through considering structures. Results We compared the multiple alignments resulting from 25 programs either based on sequence, structure or both, to reference alignments deposited in five databases (BALIBASE 2 and 3, HOMSTRAD, OXBENCH and SISYPHUS). On the whole, the structure-based methods compute more reliable alignments than the sequence-based ones, and even than the sequence+structure-based programs whatever the databases. Two programs lead, MAMMOTH and MATRAS, nevertheless the performances of MUSTANG, MATT, 3DCOMB, TCOFFEE+TM_ALIGN and TCOFFEE+SAP are better for some alignments. The advantage of structure-based methods increases at low levels of sequence identity, or for residues in regular secondary structures or buried ones. Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built databases are more challenging for the programs. Availability and implementation All data and results presented in this study are available at: http://wwwabi.snv.jussieu.fr/people/mathilde/download/AliMulComp/. Supplementary information Supplementary data are available at Bioinformatics online.


2010 ◽  
Vol 7 (3) ◽  
pp. 275-289 ◽  
Author(s):  
Vesna Memišević ◽  
Tijana Milenković ◽  
Nataša Pržulj

Summary Traditional approaches for homology detection rely on finding sufficient similarities between protein sequences. Motivated by studies demonstrating that from non-sequence based sources of biological information, such as the secondary or tertiary molecular structure, we can extract certain types of biological knowledge when sequence-based approaches fail, we hypothesize that protein-protein interaction (PPI) network topology and protein sequence might give insights into different slices of biological information. Since proteins aggregate to perform a function instead of acting in isolation, analyzing complex wirings around a protein in a PPI network could give deeper insights into the protein’s role in the inner working of the cell than analyzing sequences of individual genes. Hence, we believe that one could lose much information by focusing on sequence information alone. We examine whether the information about homologous proteins captured by PPI network topology differs and to what extent from the information captured by their sequences. We measure how similar the topology around homologous proteins in a PPI network is and show that such proteins have statistically significantly higher network similarity than nonhomologous proteins. We compare these network similarity trends of homologous proteins with the trends in their sequence identity and find that network similarities uncover almost as much homology as sequence identities. Although none of the two methods, network topology and sequence identity, seems to capture homology information in its entirety, we demonstrate that the two might give insights into somewhat different types of biological information, as the overlap of the homology information that they uncover is relatively low. Therefore, we conclude that similarities of proteins’ topological neighborhoods in a PPI network could be used as a complementary method to sequence-based approaches for identifying homologs, as well as for analyzing evolutionary distance and functional divergence of homologous proteins.


Plant Disease ◽  
2014 ◽  
Vol 98 (2) ◽  
pp. 286-286 ◽  
Author(s):  
S. Akhtar ◽  
A. J. Khan ◽  
R. W. Briddon

During a field survey in 2011, pepper (Capsicum annum) plants showing symptoms suggestive of geminivirus infection were observed in three fields in the Al-Sharqiya region of Oman. Symptoms observed included upward leaf curling leading to cupping and stunting with 15 to 25% disease incidence in surveyed fields. Total DNA was extracted from the leaves of seven symptomatic plants and subjected to rolling circle amplification (RCA). The RCA product was digested with several restriction endonucleases to obtain unit length of ~2.6 to 2.8, typical of geminivirus. Out of seven samples, only four yielded a product of ~2.6 kb in size by KpnI digestion. The fragments were cloned in pUC19 and sequenced. The partial sequences of four isolates were >95% identical to each other at the nucleotide (nt) level and thus only one isolate (P-25) was fully sequenced, determined to be 2,572 nt in length, and its sequence deposited in GenBank (KF111683). The P-25 sequence showed a genome organization typical of a mastrevirus, with four open reading frames (ORFs), two in virion-sense and two in complementary-sense. The virion and complementary-sense ORFs were separated by a long intergenic region, containing a predicted hairpin structure with the nonanucleotide sequence (TAATATTAC) in the loop, and a short intergenic region. An initial comparison to all sequences in the NCBI database using BlastN showed the isolate to have the highest level of sequence identity with isolates of the dicot-infecting mastrevirus Chickpea chlorotic dwarf virus (CpCDV). Subsequent alignments of all available CpCDV isolates using the species demarcation tool (2) showed the isolate P-25 to share between 83.6 and 90.3% identity to isolates of CpCDV available in databases, with the highest (90.3%) to CpCDV strain A originating from Syria (FR687959) (3). Amino acid sequence comparison showed that the predicted proteins encoded by the four ORFs of P-25 (coat protein [CP], movement protein [MP], replication associated protein A [RepA], and RepB) share 91.5, 88.2, 89.1, and 90.8% amino acid sequence identity, respectively, with the homologous proteins of the CpCDV isolate from Syria. Based on the recently revised mastreviruses species and strain demarcation criteria (78 and 94% whole genome nt identity, respectively) proposed by Muhire et al. (2), the results indicate that isolate P-25 represents a newly identified strain (strain F) of CpCDV. The presence of CpCDV in symptomatic pepper plants was further confirmed by Southern blot hybridization technique using digoxygenin (DIG) labeled probe prepared from CpCDV isolate P-25. The genus Mastrevirus consists of geminiviruses with single component genomes that are transmitted by leafhoppers. Mastreviruses have so far only been identified in the Old World and infect either monocotyledonous or dicotyledonous plants (1). To our knowledge, this is the first report of a mastrevirus on the Arabian Peninsula and the first record of pepper as host of CpCDV. Recently, several begomoviruses of diverse geographic origins have been found infecting vegetable crops in Oman. The propensity of geminiviruses to evolve through recombination may lead to evolution of recombinant CpCDV with new host adaptability. Due to extensive agricultural/travel links of Oman with rest of the world, there exists high probability for the spread of this virus. References: (1) M. I. Boulton. Physiol. Mol. Plant Pathol. 60:243, 2002. (2) B. Muhire et al. Arch. Virol. 158:1411, 2013 (3) H. Mumtaz et al. Virus Genes 42:422, 2011.


Sign in / Sign up

Export Citation Format

Share Document