Coupling backbone flexibility and amino acid sequence selection in protein design

Alyce Su; Stephen L. Mayo

doi:10.1002/pro.5560060810

Amino-acid site variability among natural and designed proteins

10.7287/peerj.preprints.74 ◽

2013 ◽

Author(s):

Eleisha L. Jackson ◽

Noah Ollikainen ◽

Arthur W. Covert III ◽

Tanja Kortemme ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequences ◽

Structural Constraints ◽

Scoring Functions ◽

Solvent Exposure ◽

Backbone Flexibility ◽

Hydrophobic Residues ◽

Designed Proteins ◽

Site Variability

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Download Full-text

Amino-acid site variability among natural and designed proteins

10.7287/peerj.preprints.74v1 ◽

2013 ◽

Author(s):

Eleisha L. Jackson ◽

Noah Ollikainen ◽

Arthur W. Covert III ◽

Tanja Kortemme ◽

Claus O. Wilke

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequences ◽

Structural Constraints ◽

Scoring Functions ◽

Solvent Exposure ◽

Backbone Flexibility ◽

Hydrophobic Residues ◽

Designed Proteins ◽

Site Variability

Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.

Download Full-text

AMINO ACID SEQUENCE SELECTION IN PROTEIN SYNTHESIS

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.45.9.1360 ◽

1959 ◽

Vol 45 (9) ◽

pp. 1360-1371 ◽

Cited By ~ 2

Author(s):

H. Jehle

Keyword(s):

Protein Synthesis ◽

Amino Acid ◽

Amino Acid Sequence ◽

Sequence Selection

Download Full-text

Computationally-driven identification of antibody epitopes

eLife ◽

10.7554/elife.29023 ◽

2017 ◽

Vol 6 ◽

Cited By ~ 12

Author(s):

Casey K Hua ◽

Albert T Gacerez ◽

Charles L Sentman ◽

Margaret E Ackerman ◽

Yoonjoo Choi ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Immune Responses ◽

Success Rate ◽

Protein Design ◽

Mechanisms Of Action ◽

Antigen Binding ◽

Binding Modes ◽

Prospective Application ◽

Antibody Epitopes

Understanding where antibodies recognize antigens can help define mechanisms of action and provide insights into progression of immune responses. We investigate the extent to which information about binding specificity implicitly encoded in amino acid sequence can be leveraged to identify antibody epitopes. In computationally-driven epitope localization, possible antibody–antigen binding modes are modeled, and targeted panels of antigen variants are designed to experimentally test these hypotheses. Prospective application of this approach to two antibodies enabled epitope localization using five or fewer variants per antibody, or alternatively, a six-variant panel for both simultaneously. Retrospective analysis of a variety of antibodies and antigens demonstrated an almost 90% success rate with an average of three antigen variants, further supporting the observation that the combination of computational modeling and protein design can reveal key determinants of antibody–antigen binding and enable efficient studies of collections of antibodies identified from polyclonal samples or engineered libraries.

Download Full-text

Protein sequence design by explicit energy landscape optimization

10.1101/2020.07.23.218917 ◽

2020 ◽

Cited By ~ 1

Author(s):

Christoffer Norn ◽

Basile I. M. Wicky ◽

David Juergens ◽

Sirui Liu ◽

David Kim ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Protein Design ◽

Structure Prediction ◽

De Novo ◽

Large Fraction ◽

Amino Acid Sequences ◽

Alternative States ◽

Point Energy ◽

Sequence Design

AbstractThe protein design problem is to identify an amino acid sequence which folds to a desired structure. Given Anfinsen’s thermodynamic hypothesis of folding, this can be recast as finding an amino acid sequence for which the lowest energy conformation is that structure. As this calculation involves not only all possible amino acid sequences but also all possible structures, most current approaches focus instead on the more tractable problem of finding the lowest energy amino acid sequence for the desired structure, often checking by protein structure prediction in a second step that the desired structure is indeed the lowest energy conformation for the designed sequence, and discarding the in many cases large fraction of designed sequences for which this is not the case. Here we show that by backpropagating gradients through the trRosetta structure prediction network from the desired structure to the input amino acid sequence, we can directly optimize over all possible amino acid sequences and all possible structures, and in one calculation explicitly design amino acid sequences predicted to fold into the desired structure and not any other. We find that trRosetta calculations, which consider the full conformational landscape, can be more effective than Rosetta single point energy estimations in predicting folding and stability of de novo designed proteins. We compare sequence design by landscape optimization to the standard fixed backbone sequence design methodology in Rosetta, and show that the results of the former, but not the latter, are sensitive to the presence of competing low-lying states. We show further that more funneled energy landscapes can be designed by combining the strengths of the two approaches: the low resolution trRosetta model serves to disfavor alternative states, and the high resolution Rosetta model, to create a deep energy minimum at the design target structure.SignificanceComputational protein design has primarily focused on finding sequences which have very low energy in the target designed structure. However, what is most relevant during folding is not the absolute energy of the folded state, but the energy difference between the folded state and the lowest lying alternative states. We describe a deep learning approach which captures the entire folding landscape, and show that it can enhance current protein design methods.

Download Full-text

Protein Design with Deep Learning

International Journal of Molecular Sciences ◽

10.3390/ijms222111741 ◽

2021 ◽

Vol 22 (21) ◽

pp. 11741

Author(s):

Marianne Defresne ◽

Sophie Barbe ◽

Thomas Schiex

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Amino Acid ◽

Amino Acid Sequence ◽

Protein Design ◽

Computational Protein Design ◽

Learning Technology ◽

Raw Data ◽

The Past ◽

3D Information

Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.

Download Full-text

The staining pattern of the tropomyosin sequence is displayed in a new paracrystal

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100142372 ◽

1986 ◽

Vol 44 ◽

pp. 150-151

Author(s):

M.K. Lamvik ◽

L.L. Klatt

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Staining Pattern ◽

Banding Patterns ◽

Mass Thickness ◽

New Form

Tropomyosin paracrystals have been used extensively as test specimens and magnification standards due to their clear periodic banding patterns. The paracrystal type discovered by Ohtsuki1 has been of particular interest as a test of unstained specimens because of alternating bands that differ by 50% in mass thickness. While producing specimens of this type, we came across a new paracrystal form. Since this new form displays aligned tropomyosin molecules without the overlaps that are characteristic of the Ohtsuki-type paracrystal, it presents a staining pattern that corresponds to the amino acid sequence of the molecule.

Download Full-text

Isolation and Structural Charaderization of a Potent Inhibitor of Coagulation Factor Xa from the Leech Haementeria ghilianii

Thrombosis and Haemostasis ◽

10.1055/s-0038-1646610 ◽

1989 ◽

Vol 61 (03) ◽

pp. 437-441 ◽

Cited By ~ 28

Author(s):

Cindra Condra ◽

Elka Nutt ◽

Christopher J Petroski ◽

Ellen Simpson ◽

P A Friedman ◽

...

Keyword(s):

Molecular Weight ◽

Salivary Gland ◽

Amino Acid ◽

Amino Acid Sequence ◽

Western Blot ◽

Potent Inhibitor ◽

Factor Xa ◽

Coagulation Factor ◽

Sds Page ◽

Bovine Factor

SummaryThe present work reports the discovery and charactenzation of an anticoagulant protein in the salivary gland of the giant bloodsucking leech, H. ghilianii, which is a specific and potent inhibitor of coagulation factor Xa. The inhibitor, purified to homogeneity, displayed subnanomolar inhibition of bovine factor Xa and had a molecular weight of approximately 15,000 as deduced by denaturing SDS-PAGE. The amino acid sequence of the first 43 residues of the H. ghilianii derived inhibitor displayed a striking homology to antistasin, the recently described subnanomolar inhibitor of factor Xa isolated from the Mexican leech, H. officinalis. Antisera prepared to antistasin cross-reacted with the H. ghilianii protein in Western Blot analysis. These data indicate that the giant Amazonian leech, H. ghilianii, and the smaller Mexican leech, H. officinalrs, have similar proteins which disrupt the normal hemostatic clotting mechanisms in their mammalian host’s blood.

Download Full-text

Paris I Dysfibrinogenemia: A Point Mutation in Intron 8 Results in Insertion of a 15 Amino Acid Sequence in the Fibrinogen γ-Chain

Thrombosis and Haemostasis ◽

10.1055/s-0038-1651583 ◽

1993 ◽

Vol 69 (03) ◽

pp. 217-220 ◽

Cited By ~ 12

Author(s):

Jonathan B Rosenberg ◽

Peter J Newman ◽

Michael W Mosesson ◽

Marie-Claude Guillin ◽

David L Amrani

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Point Mutation ◽

Genomic Dna ◽

Comparative Sequence Analysis ◽

Chain Gene ◽

Molecular Defect ◽

Nucleotide Position ◽

Normal Individuals ◽

Polymerase Chain

SummaryParis I dysfibrinogenemia results in the production of a fibrinogen molecule containing a functionally abnormal γ-chain. We determined the basis of the molecular defect using polymerase chain reaction (PCR) to amplify the γ-chain region of the Paris I subject’s genomic DNA. Comparative sequence analysis of cloned PCR segments of normal and Paris I genomic DNA revealed only an A→G point mutation occurring at nucleotide position 6588 within intron 8 of the Paris I γ-chain gene. We examined six normal individuals and found only normal sequence in this region, indicating that this change is not likely to represent a normal polymorphism. This nucleotide change leads to a 45 bp fragment being inserted between exons 8 and 9 in the mature γparis I chain mRNA, and encodes a 15 amino acid insert after γ350 [M-C-G-E-A-L-P-M-L-K-D-P-C-Y]. Alternative splicing of this region from intron 8 into the mature Paris I γ-chain mRNA also results after translation into a substitution of S for G at position γ351. Biochemical studies of 14C-iodoacetamide incorporation into disulfide-reduced Paris I and normal fibrinogen corroborated the molecular biologic predictions that two additional cysteine residues exist within the γpariS I chain. We conclude that the insertion of this amino acid sequence leads to a conformationallyaltered, and dysfunctional γ-chain in Paris I fibrinogen.

Download Full-text

Complete Covalent Structure of Human Platelet Factor 4

Thrombosis and Haemostasis ◽

10.1055/s-0038-1657071 ◽

1979 ◽

Vol 42 (05) ◽

pp. 1652-1660 ◽

Cited By ~ 4

Author(s):

Francis J Morgan ◽

Geoffrey S Begg ◽

Colin N Chesterman

Keyword(s):

Amino Acids ◽

Molecular Weight ◽

Amino Acid ◽

Amino Acid Sequence ◽

Human Platelet ◽

Platelet Factor 4 ◽

Platelet Factor ◽

Disulphide Bonds ◽

Covalent Structure

SummaryThe amino acid sequence of the subunit of human platelet factor 4 has been determined. Human platelet factor 4 consists of identical subunits containing 70 amino acids, each with a molecular weight of 7,756. The molecule contains no methionine, phenylalanine or tryptophan. The proposed amino acid sequence of PF4 is: Glu-Ala-Glu-Glu-Asp-Gly-Asp-Leu-Gln-Cys-Leu-Cys-Val-Lys-Thr-Thr-Ser- Gln-Val-Arg-Pro-Arg-His-Ile-Thr-Ser-Leu-Glu-Val-Ile-Lys-Ala-Gly-Pro-His-Cys-Pro-Thr-Ala-Gin- Leu-Ile-Ala-Thr-Leu-Lys-Asn-Gly-Arg-Lys-Ile-Cys-Leu-Asp-Leu-Gln-Ala-Pro-Leu-Tyr-Lys-Lys- Ile-Ile-Lys-Lys-Leu-Leu-Glu-Ser. From consideration of the homology with p-thromboglobulin, disulphide bonds between residues 10 and 36 and between residues 12 and 52 can be inferred.

Download Full-text