Sibe: a computation tool to apply protein sequence statistics to folding and design

Mapping Intimacies ◽

10.1101/380576 ◽

2018 ◽

Author(s):

Ngaam J. Cheung ◽

Wookyung Yu

Keyword(s):

Protein Folding ◽

Sequence Analysis ◽

Protein Design ◽

Protein Sequence ◽

In Silico ◽

Rational Design ◽

Sequence Data ◽

Protein Sequence Analysis ◽

Rational Protein Design ◽

Interface Modules

ABSTRACTStatistical analysis plays a significant role in both protein sequences and structures, expanding in recent years from the studies of co-evolution guided single-site mutations to protein folding in silico. Here we describe a computational tool, termed Sibe, with a particular focus on protein sequence analysis, folding and design. Since Sibe has various easy-interface modules, expressive architecture and extensible codes, it is powerful in statistically analyzing sequence data and building energetic potentials in boosting both protein folding and design. In this study, Sibe is used to capture positionally conserved couplings between pairwise amino acids and help rational protein design, in which the pairwise couplings are filtered according to the relative entropy computed from the positional conservations and grouped into several ‘blocks’. A human β2-adrenergic receptor (β2AR) was used to demonstrated that those ‘blocks’ could contribute rational design at functional residues. In addition, Sibe provides protein folding modules based on both the positionally conserved couplings and well-established statistical potentials. Sibe provides various easy to use command-line interfaces in C++ and/or Python. Sibe was developed for compatibility with the ‘big data’ era, and it primarily focuses on protein sequence analysis, in silico folding and design, but it is also applicable to extend for other modeling and predictions of experimental measurements.

Download Full-text

Assessment of in silico protein sequence analysis in the clinical classification of variants in cancer risk genes

Journal of Community Genetics ◽

10.1007/s12687-016-0289-x ◽

2017 ◽

Vol 8 (2) ◽

pp. 87-95 ◽

Cited By ~ 10

Author(s):

Iain D. Kerr ◽

Hannah C. Cox ◽

Kelsey Moyes ◽

Brent Evans ◽

Brianna C. Burdett ◽

...

Keyword(s):

Sequence Analysis ◽

Cancer Risk ◽

Protein Sequence ◽

In Silico ◽

Protein Sequence Analysis ◽

Clinical Classification ◽

Risk Genes

Download Full-text

Generating functional protein variants with variational autoencoders

PLoS Computational Biology ◽

10.1371/journal.pcbi.1008736 ◽

2021 ◽

Vol 17 (2) ◽

pp. e1008736

Author(s):

Alex Hawkins-Hooker ◽

Florence Depardieu ◽

Sebastien Baur ◽

Guillaume Couairon ◽

Arthur Chen ◽

...

Keyword(s):

Protein Design ◽

Protein Sequence ◽

Rational Design ◽

Sequence Data ◽

3D Structure ◽

Generative Models ◽

Functional Protein ◽

Long Distance ◽

Functional Variants ◽

Protein Sequence Data

The vast expansion of protein sequence databases provides an opportunity for new protein design approaches which seek to learn the sequence-function relationship directly from natural sequence variation. Deep generative models trained on protein sequence data have been shown to learn biologically meaningful representations helpful for a variety of downstream tasks, but their potential for direct use in the design of novel proteins remains largely unexplored. Here we show that variational autoencoders trained on a dataset of almost 70000 luciferase-like oxidoreductases can be used to generate novel, functional variants of the luxA bacterial luciferase. We propose separate VAE models to work with aligned sequence input (MSA VAE) and raw sequence input (AR-VAE), and offer evidence that while both are able to reproduce patterns of amino acid usage characteristic of the family, the MSA VAE is better able to capture long-distance dependencies reflecting the influence of 3D structure. To confirm the practical utility of the models, we used them to generate variants of luxA whose luminescence activity was validated experimentally. We further showed that conditional variants of both models could be used to increase the solubility of luxA without disrupting function. Altogether 6/12 of the variants generated using the unconditional AR-VAE and 9/11 generated using the unconditional MSA VAE retained measurable luminescence, together with all 23 of the less distant variants generated by conditional versions of the models; the most distant functional variant contained 35 differences relative to the nearest training set sequence. These results demonstrate the feasibility of using deep generative models to explore the space of possible protein sequences and generate useful variants, providing a method complementary to rational design and directed evolution approaches.

Download Full-text

Protein sequence analysis in silico: application of structure-based bioinformatics to genomic initiatives

Current Opinion in Pharmacology ◽

10.1016/s1471-4892(02)00202-3 ◽

2002 ◽

Vol 2 (5) ◽

pp. 574-580 ◽

Cited By ~ 9

Author(s):

D Michalovich

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

In Silico ◽

Protein Sequence Analysis

Download Full-text

Protein sequence analysis using relational soft clustering algorithms

International Journal of Computer Mathematics ◽

10.1080/00207160701210083 ◽

2007 ◽

Vol 84 (5) ◽

pp. 599-617 ◽

Cited By ~ 4

Author(s):

Pradipta Maji ◽

Sankar K. Pal

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

Clustering Algorithms ◽

Protein Sequence Analysis ◽

Soft Clustering

Download Full-text

Nucleic acid and protein sequence analysis: A practical approach

Trends in Genetics ◽

10.1016/0168-9525(88)90128-x ◽

1988 ◽

Vol 4 (1) ◽

pp. 26

Author(s):

B.G. Barrell

Keyword(s):

Nucleic Acid ◽

Sequence Analysis ◽

Protein Sequence ◽

Practical Approach ◽

Protein Sequence Analysis

Download Full-text

Scan Statistics in DNA and Protein Sequence Analysis

Scan Statistics - Springer Series in Statistics ◽

10.1007/978-1-4757-3460-7_6 ◽

2001 ◽

pp. 81-96

Author(s):

Joseph Glaz ◽

Joseph Naus ◽

Sylvan Wallenstein

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

Scan Statistics ◽

Protein Sequence Analysis

Download Full-text

Escherichia Coli — Functional and Evolutionary Implications of Genome Scale Computer-Aided Protein Sequence Analysis

Genomes of Plants and Animals - Stadler Genetics Symposia Series ◽

10.1007/978-1-4899-0280-1_14 ◽

1996 ◽

pp. 177-210 ◽

Cited By ~ 1

Author(s):

Eugene V. Koonin ◽

Roman L. Tatusov ◽

Kenneth E. Rudd

Keyword(s):

Escherichia Coli ◽

Sequence Analysis ◽

Protein Sequence ◽

Protein Sequence Analysis ◽

Computer Aided ◽

Genome Scale

Download Full-text

Protein Sequence Analysis and Prediction of Secondary Structural Features

Compact Handbook of Computational Biology ◽

10.1201/9780203021415-7 ◽

2004 ◽

pp. 111-198

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

Structural Features ◽

Protein Sequence Analysis

Download Full-text

IMPALA/RPS-BLAST/PSI-BLAST in protein sequence analysis

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics ◽

10.1002/047001153x.g403411 ◽

2005 ◽

Author(s):

Yuri I. Wolf

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

Protein Sequence Analysis

Download Full-text

Proteomics: Protein Sequence Analysis

An Introduction to Computational Biochemistry ◽

10.1002/0471223840.ch11 ◽

2003 ◽

pp. 209-231

Keyword(s):

Sequence Analysis ◽

Protein Sequence ◽

Protein Sequence Analysis

Download Full-text