scholarly journals Sibe: a computation tool to apply protein sequence statistics to folding and design

2018 ◽  
Author(s):  
Ngaam J. Cheung ◽  
Wookyung Yu

ABSTRACTStatistical analysis plays a significant role in both protein sequences and structures, expanding in recent years from the studies of co-evolution guided single-site mutations to protein folding in silico. Here we describe a computational tool, termed Sibe, with a particular focus on protein sequence analysis, folding and design. Since Sibe has various easy-interface modules, expressive architecture and extensible codes, it is powerful in statistically analyzing sequence data and building energetic potentials in boosting both protein folding and design. In this study, Sibe is used to capture positionally conserved couplings between pairwise amino acids and help rational protein design, in which the pairwise couplings are filtered according to the relative entropy computed from the positional conservations and grouped into several ‘blocks’. A human β2-adrenergic receptor (β2AR) was used to demonstrated that those ‘blocks’ could contribute rational design at functional residues. In addition, Sibe provides protein folding modules based on both the positionally conserved couplings and well-established statistical potentials. Sibe provides various easy to use command-line interfaces in C++ and/or Python. Sibe was developed for compatibility with the ‘big data’ era, and it primarily focuses on protein sequence analysis, in silico folding and design, but it is also applicable to extend for other modeling and predictions of experimental measurements.

2017 ◽  
Vol 8 (2) ◽  
pp. 87-95 ◽  
Author(s):  
Iain D. Kerr ◽  
Hannah C. Cox ◽  
Kelsey Moyes ◽  
Brent Evans ◽  
Brianna C. Burdett ◽  
...  

2021 ◽  
Vol 17 (2) ◽  
pp. e1008736
Author(s):  
Alex Hawkins-Hooker ◽  
Florence Depardieu ◽  
Sebastien Baur ◽  
Guillaume Couairon ◽  
Arthur Chen ◽  
...  

The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.


Sign in / Sign up

Export Citation Format

Share Document