Computational protein design with backbone plasticity

James T. MacDonald; Paul S. Freemont

doi:10.1042/bst20160155

Extreme stability in de novo-designed repeat arrays is determined by unusually stable short-range interactions

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800283115 ◽

2018 ◽

Vol 115 (29) ◽

pp. 7539-7544 ◽

Cited By ~ 12

Author(s):

Kathryn Geiger-Schuller ◽

Kevin Sforza ◽

Max Yuhas ◽

Fabio Parmeggiani ◽

David Baker ◽

...

Keyword(s):

Protein Design ◽

Nearest Neighbor ◽

De Novo ◽

Protein Structures ◽

Free Energies ◽

Intrinsic Stability ◽

Repeat Proteins ◽

Naturally Occurring ◽

Wide Range ◽

The Individual

Designed helical repeats (DHRs) are modular helix–loop–helix–loop protein structures that are tandemly repeated to form a superhelical array. Structures combining tandem DHRs demonstrate a wide range of molecular geometries, many of which are not observed in nature. Understanding cooperativity of DHR proteins provides insight into the molecular origins of Rosetta-based protein design hyperstability and facilitates comparison of energy distributions in artificial and naturally occurring protein folds. Here, we use a nearest-neighbor Ising model to quantify the intrinsic and interfacial free energies of four different DHRs. We measure the folding free energies of constructs with varying numbers of internal and terminal capping repeats for four different DHR folds, using guanidine-HCl and glycerol as destabilizing and solubilizing cosolvents. One-dimensional Ising analysis of these series reveals that, although interrepeat coupling energies are within the range seen for naturally occurring repeat proteins, the individual repeats of DHR proteins are intrinsically stable. This favorable intrinsic stability, which has not been observed for naturally occurring repeat proteins, adds to stabilizing interfaces, resulting in extraordinarily high stability. Stable repeats also impart a downhill shape to the energy landscape for DHR folding. These intrinsic stability differences suggest that part of the success of Rosetta-based design results from capturing favorable local interactions.

Download Full-text

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

10.26434/chemrxiv.11626098 ◽

2020 ◽

Author(s):

Yifei Qi ◽

John Z.H. Zhang

Keyword(s):

Neural Network ◽

Protein Design ◽

Protein Sequence ◽

Protein Structures ◽

Three Dimensional ◽

Search Space ◽

Computational Protein Design ◽

Data Sets ◽

Protein Backbone ◽

Natural Amino Acids

<p>Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at <a href="http://protein.org.cn/densecpd.html">http://protein.org.cn/densecpd.html</a>. The results of this study may benefit the further development of computational protein design methods.</p>

Download Full-text

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

10.26434/chemrxiv.11626098.v1 ◽

2020 ◽

Author(s):

Yifei Qi ◽

John Z.H. Zhang

Keyword(s):

Neural Network ◽

Protein Design ◽

Protein Sequence ◽

Protein Structures ◽

Three Dimensional ◽

Search Space ◽

Computational Protein Design ◽

Data Sets ◽

Protein Backbone ◽

Natural Amino Acids

<p>Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at <a href="http://protein.org.cn/densecpd.html">http://protein.org.cn/densecpd.html</a>. The results of this study may benefit the further development of computational protein design methods.</p>

Download Full-text

Protein designer David Baker: I like doing things that seem like magic

National Science Review ◽

10.1093/nsr/nwaa071 ◽

2020 ◽

Vol 7 (8) ◽

pp. 1410-1412

Author(s):

Weijie Zhao ◽

Chu Wang

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

Computational Prediction ◽

Biological Functions ◽

Personal Experiences ◽

De Novo Protein Design ◽

And Function ◽

The University ◽

Opening Up

Abstract Search ‘de novo protein design’ on Google and you will find the name David Baker in all results of the first page. Professor David Baker at the University of Washington and other scientists are opening up a new world of fantastic proteins. Protein is the direct executor of most biological functions and its structure and function are fully determined by its primary sequence. Baker's group developed the Rosetta software suite that enabled the computational prediction and design of protein structures. Being able to design proteins from scratch means being able to design executors for diverse purposes and benefit society in multiple ways. Recently, NSR interviewed Prof. Baker on this fast-developing field and his personal experiences.

Download Full-text

New computational protein design methods for de novo small molecule binding sites

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008178 ◽

2020 ◽

Vol 16 (10) ◽

pp. e1008178

Author(s):

James E. Lucas ◽

Tanja Kortemme

Keyword(s):

Small Molecule ◽

Protein Design ◽

Binding Sites ◽

De Novo ◽

Design Methods ◽

Computational Protein Design

Download Full-text

Computational Protein Design Under a Given Backbone Structure with the ABACUS Statistical Energy Function

Methods in Molecular Biology - Computational Protein Design ◽

10.1007/978-1-4939-6637-0_10 ◽

2016 ◽

pp. 217-226 ◽

Cited By ~ 8

Author(s):

Peng Xiong ◽

Quan Chen ◽

Haiyan Liu

Keyword(s):

Protein Design ◽

Energy Function ◽

Computational Protein Design ◽

Backbone Structure

Download Full-text

De novo protein design: how do we expand into the universe of possible protein structures?

Current Opinion in Structural Biology ◽

10.1016/j.sbi.2015.05.009 ◽

2015 ◽

Vol 33 ◽

pp. 16-26 ◽

Cited By ~ 110

Author(s):

Derek N Woolfson ◽

Gail J Bartlett ◽

Antony J Burton ◽

Jack W Heal ◽

Ai Niitsu ◽

...

Keyword(s):

Protein Design ◽

De Novo ◽

Protein Structures ◽

De Novo Protein Design ◽

The Universe

Download Full-text

Backbone flexibility in computational protein design

Current Opinion in Biotechnology ◽

10.1016/j.copbio.2009.07.006 ◽

2009 ◽

Vol 20 (4) ◽

pp. 420-428 ◽

Cited By ~ 79

Author(s):

Daniel J Mandell ◽

Tanja Kortemme

Keyword(s):

Protein Design ◽

Computational Protein Design ◽

Backbone Flexibility

Download Full-text

Multi-Scale Structural Analysis of Proteins by Deep Semantic Segmentation

10.1101/474627 ◽

2018 ◽

Author(s):

Raphael R. Eguchi ◽

Po-Ssu Huang

Keyword(s):

Image Classification ◽

Protein Design ◽

Large Scale ◽

De Novo ◽

Protein Structures ◽

Semantic Segmentation ◽

Amino Acid Sequences ◽

Structural Quality ◽

Small Subset ◽

Structural Prediction

AbstractRecent advancements in computational methods have facilitated large-scale sampling of protein structures, leading to breakthroughs in protein structural prediction and enabling de novo protein design. Establishing methods to identify candidate structures that can lead to native folds or designable structures remains a challenge, since few existing metrics capture high-level structural features such as architectures, folds, and conformity to conserved structural motifs. Convolutional Neural Networks (CNNs) have been successfully used in semantic segmentation — a subfield of image classification in which a class label is predicted for every pixel. Here, we apply semantic segmentation to protein structures as a novel strategy for fold identification and structural quality assessment. We represent protein structures as 2D α-carbon distance matrices (“contact maps”), and train a CNN that assigns each residue in a multi-domain protein to one of 38 architecture classes designated by the CATH database. Our model performs exceptionally well, achieving a per-residue accuracy of 90.8% on the test set (95.0% average accuracy over all classes; 87.8% average within-structure accuracy). The unique aspect of our classifier is that it encodes sequence agnostic residue environments from the PDB and can assess structural quality as quantitative probabilities. We demonstrate that individual class probabilities can be used as a metric that indicates the degree to which a randomly generated structure assumes a specific fold, as well as a metric that highlights non-conformative regions of a protein belonging to a known class. These capabilities yield a powerful tool for guiding structural sampling for both structural prediction and design.SignificanceRecent computational advances have allowed researchers to predict the structure of many proteins from their amino acid sequences, as well as designing new sequences that fold into predefined structures. However, these tasks are often challenging because they require selection of a small subset of promising structural models from a large pool of stochastically generated ones. Here, we describe a novel approach to protein model selection that uses 2D image classification techniques to evaluate 3D protein models. Our method can be used to select structures based on the fold that they adopt, and can also be used to identify regions of low structural quality. These capabilities yield a powerful tool for both protein design and structure prediction.

Download Full-text

Design of complicated all-α protein structures

10.1101/2021.07.14.449347 ◽

2021 ◽

Author(s):

Koya Sakuma ◽

Naohiro Kobayashi ◽

Toshihiko Sugiki ◽

Toshio Nagashima ◽

Toshimichi Fujiwara ◽

...

Keyword(s):

De Novo ◽

Protein Structures ◽

Building Blocks ◽

Helical Structures ◽

Helix Loop Helix ◽

Naturally Occurring ◽

Wide Range ◽

Helical Protein ◽

Helical Proteins ◽

The Universe

A wide range of de novo protein structure designs have been achieved, but the complexity of naturally occurring protein structures is still far beyond these designs. To expand the diversity and complexity of de novo designed protein structures, we sought to develop a method for designing 'difficult-to-describe' α-helical protein structures composed of irregularly aligned α-helices, such as globins. Backbone structure libraries consisting of a myriad of α-helical structures with 5- or 6- helices were generated by combining 18 helix-loop-helix motifs and canonical α-helices, and five distinct topologies were selected for de novo design. The designs were found to be monomeric with high thermal stability in solution and fold into the target topologies with atomic accuracy. This study demonstrated that complicated α-helical proteins are created using typical building blocks. The method we developed would enable us to explore the universe of protein structures for designing novel functional proteins.

Download Full-text