A sequence-based approach for identifying protein fold switchers

Mapping Intimacies ◽

10.1101/462606 ◽

2018 ◽

Author(s):

Soumya Mishra ◽

Loren L. Looger ◽

Lauren L. Porter

Keyword(s):

Secondary Structure ◽

Drug Targets ◽

Prediction Method ◽

Data Bank ◽

Amino Acid Sequences ◽

Protein Fold ◽

Secondary Structure Predictions ◽

Underlying Mechanisms ◽

Cellular Control ◽

Potential Drug Targets

AbstractAlthough most proteins conform to the classical one-structure/one-function paradigm, an increasing number of proteins with dual structures and functions are emerging. These fold-switching proteins remodel their secondary structures in response to cellular stimuli, fostering multi-functionality and tight cellular control. Accurate predictions of fold-switching proteins could both suggest underlying mechanisms for uncharacterized biological processes and reveal potential drug targets. Previously, we developed a prediction method for fold-switching proteins based on secondary structure predictions and structure-based thermodynamic calculations. Given the large number of genomic sequences without homologous experimentally characterized structures, however, we sought to predict fold-switching proteins from their sequences alone. To do this, we leveraged state-of-the-art secondary structure predictions, which require only amino acid sequences but are not currently designed to identify structural duality in proteins. Thus, we hypothesized that incorrect and inconsistent secondary structure predictions could be good initial predictors of fold-switching proteins. We found that secondary structure predictions of fold-switching proteins with solved structures are indeed less accurate than secondary structure predictions of non-fold-switching proteins with solved structures. These inaccuracies result largely from the conformations of fold-switching proteins that are underrepresented in the Protein Data Bank (PDB), and, consequently, the training sets of secondary structure predictors. Given that secondary structure predictions are homology-based, we hypothesized that decontextualizing the inaccurately-predicted regions of fold-switching proteins could weaken the homology relationships between these regions and their overpopulated structural representatives. Thus, we reran secondary structure predictions on these regions in isolation and found that they were significantly more inconsistent than in regions of non-fold-switching proteins. Thus, inconsistent secondary structure predictions can serve as a preliminary marker of fold switching. These findings have implications for genomics and the future development of secondary structure predictors.

Download Full-text

Inaccurate secondary structure predictions often indicate protein fold switching

Protein Science ◽

10.1002/pro.3664 ◽

2019 ◽

Vol 28 (8) ◽

pp. 1487-1493 ◽

Cited By ~ 10

Author(s):

Soumya Mishra ◽

Loren L. Looger ◽

Lauren L. Porter

Keyword(s):

Secondary Structure ◽

Protein Fold ◽

Secondary Structure Predictions

Download Full-text

Extant fold-switching proteins are widespread

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800168115 ◽

2018 ◽

Vol 115 (23) ◽

pp. 5968-5973 ◽

Cited By ~ 36

Author(s):

Lauren L. Porter ◽

Loren L. Looger

Keyword(s):

Secondary Structure ◽

Cell Biology ◽

3D Structure ◽

Data Bank ◽

Globular Proteins ◽

Physiological Conditions ◽

Central Tenet ◽

Secondary Structure Predictions ◽

Characteristic Features ◽

Lines Of Evidence

A central tenet of biology is that globular proteins have a unique 3D structure under physiological conditions. Recent work has challenged this notion by demonstrating that some proteins switch folds, a process that involves remodeling of secondary structure in response to a few mutations (evolved fold switchers) or cellular stimuli (extant fold switchers). To date, extant fold switchers have been viewed as rare byproducts of evolution, but their frequency has been neither quantified nor estimated. By systematically and exhaustively searching the Protein Data Bank (PDB), we found ∼100 extant fold-switching proteins. Furthermore, we gathered multiple lines of evidence suggesting that these proteins are widespread in nature. Based on these lines of evidence, we hypothesized that the frequency of extant fold-switching proteins may be underrepresented by the structures in the PDB. Thus, we sought to identify other putative extant fold switchers with only one solved conformation. To do this, we identified two characteristic features of our ∼100 extant fold-switching proteins, incorrect secondary structure predictions and likely independent folding cooperativity, and searched the PDB for other proteins with similar features. Reassuringly, this method identified dozens of other proteins in the literature with indication of a structural change but only one solved conformation in the PDB. Thus, we used it to estimate that 0.5–4% of PDB proteins switch folds. These results demonstrate that extant fold-switching proteins are likely more common than the PDB reflects, which has implications for cell biology, genomics, and human health.

Download Full-text

A method for predicting evolved fold switchers exclusively from their sequences

10.1101/2020.02.19.956805 ◽

2020 ◽

Author(s):

Allen K. Kim ◽

Loren L. Looger ◽

Lauren L. Porter

Keyword(s):

Secondary Structure ◽

Structural Biology ◽

Matthews Correlation Coefficient ◽

Statistical Significance ◽

Protein Structures ◽

Amino Acid Sequences ◽

Biological Functions ◽

Predictive Methods ◽

Secondary Structure Predictions ◽

Α Helix

AbstractAlthough most proteins with known structures conform to the longstanding rule-of-thumb that high levels of aligned sequence identity tend to indicate similar folds and functions, an increasing number of exceptions is emerging. In spite of having highly similar sequences, these “evolved fold switchers” (1) can adopt radically different folds with disparate biological functions. Predictive methods for identifying evolved fold switchers are desirable because some of them are associated with disease and/or can perform different functions in cells. Previously, we showed that inconsistencies between predicted and experimentally determined secondary structures can be used to predict fold switching proteins (2). The usefulness of this approach is limited, however, because it requires experimentally determined protein structures, whose magnitude is dwarfed by the number of genomic proteins. Here, we use secondary structure predictions to identify evolved fold switchers from their amino acid sequences alone. To do this, we looked for inconsistencies between the secondary structure predictions of the alternative conformations of evolved fold switchers. We used three different predictors in this study: JPred4, PSIPRED, and SPIDER3. We find that overall inconsistencies are not a significant predictor of evolved fold switchers for any of the three predictors. Inconsistencies between α-helix and β-strand predictions made by JPred4, however, can discriminate between the different conformations of evolved fold switchers with statistical significance (p < 1.7*10−13). In light of this observation, we used these inconsistencies as a classifier and found that it could robustly discriminate between evolved fold switchers and evolved non-fold-switchers, as evidenced by a Matthews correlation coefficient of 0.90. These results indicate that inconsistencies between secondary structure predictions can indeed be used to identify evolved fold switchers from their genomic sequences alone. Our findings have implications for genomics, structural biology, and human health.

Download Full-text

The ‘30K’ superfamily of viral movement proteins

Microbiology ◽

10.1099/0022-1317-81-1-257 ◽

2000 ◽

Vol 81 (1) ◽

pp. 257-266 ◽

Cited By ~ 206

Author(s):

Ulrich Melcher

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Consensus Sequence ◽

Amino Acid Sequences ◽

Movement Proteins ◽

Consensus Sequences ◽

Viral Movement ◽

Secondary Structure Predictions ◽

Α Helix

Relationships among the amino acid sequences of viral movement proteins related to the 30 kDa (‘30K’) movement protein of tobacco mosaic virus – the 30K superfamily – were explored. Sequences were grouped into 18 families. A comparison of secondary structure predictions for each family revealed a common predicted core structure flanked by variable N- and C-terminal domains. The core consisted of a series of β-elements flanked by an α-helix on each end. Consensus sequences for each of the families were generated and aligned with one another. From this alignment an overall secondary structure prediction was generated and a consensus sequence that can recognize each family in database searches was obtained. The analysis led to criteria that were used to evaluate other virus-encoded proteins for possible membership of the 30K superfamily. A rhabdoviral and a tenuiviral protein were identified as 30K superfamily members, as were plant-encoded phloem proteins. Parsimony analysis grouped tubule-forming movement proteins separate from others. Establishment of the alignment of residues of diverse families facilitates comparison of mutagenesis experiments done on different movement proteins and should serve as a guide for further such experiments.

Download Full-text

Atypical Structural Tendencies Among Low-Complexity Domains in the Protein Data Bank Proteome

10.1101/807438 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sean M. Cascarina ◽

Mikaela R. Elder ◽

Eric D. Ross

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Secondary Structure ◽

Physical Properties ◽

Protein Data Bank ◽

Data Bank ◽

Low Complexity ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

Intrinsically Disordered

AbstractA variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment.Author SummaryThe structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.

Download Full-text

Identification of Mitochondrial Proteins of Malaria Parasite Adding the New Parameter

Letters in Organic Chemistry ◽

10.2174/1570178615666180608100348 ◽

2019 ◽

Vol 16 (4) ◽

pp. 258-262 ◽

Cited By ~ 1

Author(s):

Feng Yonge ◽

Xie Weixia

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Drug Targets ◽

Malaria Parasite ◽

Protein Secondary Structure ◽

Amino Acid Sequences ◽

Mitochondrial Proteins ◽

Support Vector ◽

Protein Secondary Structures ◽

Similar Work

Malaria has been one of the serious infectious diseases caused by Plasmodium falciparum (P. falciparum). Mitochondrial proteins of P. falciparum are regarded as effective drug targets against malaria. Thus, it is necessary to accurately identify mitochondrial proteins of malaria parasite. Many algorithms have been proposed for the prediction of mitochondrial proteins of malaria parasite and yielded the better results. However, the parameters used by these methods were primarily based on amino acid sequences. In this study, we added a novel parameter for predicting mitochondrial proteins of malaria parasite based on protein secondary structure. Firstly, we extracted three feature parameters, namely, three kinds of protein secondary structures compositions (3PSS), 20 amino acid compositions (20AAC) and 400 dipeptide compositions (400DC), and used the analysis of variance (ANOVA) to screen 400 dipeptides. Secondly, we adopted these features to predict mitochondrial proteins of malaria parasite by using support vector machine (SVM). Finally, we found that 1) adding the feature of protein secondary structure (3PSS) can indeed improve the prediction accuracy. This result demonstrated that the parameter of protein secondary structure is a valid feature in the prediction of mitochondrial proteins of malaria parasite; 2) feature combination can improve the prediction’s results; feature selection can reduce the dimension and simplify the calculation. We achieved the sensitivity (Sn) of 98.16%, the specificity (Sp) of 97.64% and overall accuracy (Acc) of 97.88% with 0.957 of Mathew’s correlation coefficient (MCC) by using 3PSS+ 20AAC+ 34DC as a feature in 15-fold cross-validation. This result is compared with that of the similar work in the same dataset, showing the superiority of our work.

Download Full-text