The BackMAP Python module: how a simpler Ramachandran number can simplify the life of a protein simulator

Enhancing protein backbone angle prediction by using simpler models of deep neural networks

Scientific Reports ◽

10.1038/s41598-020-76317-6 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Fereshteh Mataeimoghadam ◽

M. A. Hakim Newton ◽

Abdollah Dehzangi ◽

Abdul Karim ◽

B. Jayaram ◽

...

Keyword(s):

Neural Network ◽

Neural Networks ◽

Structure Prediction ◽

Protein Structures ◽

Absolute Error ◽

Grand Challenge ◽

Protein Backbone ◽

The Neural Network ◽

Benchmark Datasets ◽

The Neural Networks

Abstract Protein structure prediction is a grand challenge. Prediction of protein structures via the representations using backbone dihedral angles has recently achieved significant progress along with the on-going surge of deep neural network (DNN) research in general. However, we observe that in the protein backbone angle prediction research, there is an overall trend to employ more and more complex neural networks and then to throw more and more features to the neural networks. While more features might add more predictive power to the neural network, we argue that redundant features could rather clutter the scenario and more complex neural networks then just could counterbalance the noise. From artificial intelligence and machine learning perspectives, problem representations and solution approaches do mutually interact and thus affect performance. We also argue that comparatively simpler predictors can more easily be reconstructed than the more complex ones. With these arguments in mind, we present a deep learning method named Simpler Angle Predictor (SAP) to train simpler DNN models that enhance protein backbone angle prediction. We then empirically show that SAP can significantly outperform existing state-of-the-art methods on well-known benchmark datasets: for some types of angles, the differences are 6–8 in terms of mean absolute error (MAE). The SAP program along with its data is available from the website https://gitlab.com/mahnewton/sap.

Download Full-text

P.R.E.S.S. — AN R-PACKAGE FOR EXPLORING RESIDUAL-LEVEL PROTEIN STRUCTURAL STATISTICS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720012420073 ◽

2012 ◽

Vol 10 (03) ◽

pp. 1242007 ◽

Cited By ~ 3

Author(s):

YUANYUAN HUANG ◽

STEPHEN BONETT ◽

ANDRZEJ KLOCZKOWSKI ◽

ROBERT JERNIGAN ◽

ZHIJUN WU

Keyword(s):

Structural Properties ◽

Open Source Software ◽

Protein Structures ◽

R Package ◽

Residue Level ◽

Large Set ◽

Torsion Angles ◽

Residual Level ◽

Open Source Software Package ◽

User Friendly

P.R.E.S.S. is an R-package developed to allow researchers to get access to and manipulate a large set of statistical data on protein residue-level structural properties such as residue-level virtual bond lengths, virtual bond angles, and virtual torsion angles. A large set of high-resolution protein structures is downloaded and surveyed. Their residue-level structural properties are calculated and documented. The statistical distributions and correlations of these properties can be queried and displayed. Tools are also provided for modeling and analyzing a given structure in terms of its residue-level structural properties. In particular, new tools for computing residue-level statistical potentials and displaying residue-level Ramachandran-like plots are developed for structural analysis and refinement. P.R.E.S.S. has been released in R as an open source software package, with a user-friendly GUI, accessible and executable by a public user in any R environment. P.R.E.S.S. can also be downloaded directly at http://www.math.iastate.edu/press/ .

Download Full-text

Variation Analysis of Position, Velocity, and Acceleration of Two-Dimensional Mechanisms by the Direct Linearization Method

Volume 5: 35th Design Automation Conference, Parts A and B ◽

10.1115/detc2009-86236 ◽

2009 ◽

Cited By ~ 1

Author(s):

Robert C. Leishman ◽

Kenneth W. Chase

Keyword(s):

Closed Form ◽

Process Variations ◽

Linearization Method ◽

Analysis Tool ◽

Large Set ◽

Computationally Efficient ◽

Closed Form Solutions ◽

Direct Linearization ◽

The Mean ◽

Kinematic Performance

Velocity and acceleration analysis is an important tool for predicting the motion of mechanisms. The results, however, may be inaccurate when applied to manufactured products, due to the process variations which occur in production. Small changes in dimensions can accumulate and propagate in an assembly, which may cause significant variation in critical kinematic performance parameters. A new statistical analysis tool is presented for predicting the effects of variation on mechanism kinematic performance. It is based on the Direct Linearization Method developed for static assemblies. The solution is closed form, and may be applied to 2-D, open or closed, multi-loop mechanisms, employing common kinematic joints. It is also shown how form, orientation, and position variations may be included in the analysis to analyze variations that occur in kinematic joints. Closed form solutions eliminate the need of generating a large set of random assemblies, and analyzing them one-by one, to determine the expected range of critical variables. Only two assemblies are analyzed to characterize the entire population. The first determines the performance of the mean, or average assembly, and the second estimates the range of variation about the mean. The system is computationally efficient and well suited for design iteration.

Download Full-text

New Tools and Resources for Analysing Protein Structures and Their Interactions

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444998007318 ◽

1998 ◽

Vol 54 (6) ◽

pp. 1132-1138 ◽

Cited By ~ 13

Author(s):

Nicholas M. Luscombe ◽

Roman A. Laskowski ◽

David R. Westhead ◽

Duncan Milburn ◽

Susan Jones ◽

...

Keyword(s):

Sequence Analysis ◽

Dna Binding ◽

Binding Proteins ◽

Protein Structures ◽

Structural Data ◽

Analysis Tool ◽

Protein Topology ◽

Protein Interfaces ◽

Sequence Analysis Tool

The determination of protein structures has furthered our understanding of how various proteins perform their functions. With the large number of structures currently available in the PDB, it is necessary to be able to easily study these proteins in detail. Here new software tools are presented which aim to facilitate this analysis; these include the PDBsum WWW site which provides a summary description of all PDB entries, the programsTOPSandNUCPLOTto plot schematic diagrams representing protein topology and DNA-binding interactions, SAS a WWW-based sequence-analysis tool incorporating structural data, and WWW servers for the analysis of protein–protein interfaces and analyses of over 300 haem-binding proteins.

Download Full-text

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

10.26434/chemrxiv.11626098 ◽

2020 ◽

Author(s):

Yifei Qi ◽

John Z.H. Zhang

Keyword(s):

Neural Network ◽

Protein Design ◽

Protein Sequence ◽

Protein Structures ◽

Three Dimensional ◽

Search Space ◽

Computational Protein Design ◽

Data Sets ◽

Protein Backbone ◽

Natural Amino Acids

<p>Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at <a href="http://protein.org.cn/densecpd.html">http://protein.org.cn/densecpd.html</a>. The results of this study may benefit the further development of computational protein design methods.</p>

Download Full-text

Link Your Sites (LYS.py): Coupling your PAML codeml results and homologous protein structures in PyMOL

10.1101/380394 ◽

2018 ◽

Author(s):

Lys Sanz Moreta ◽

Rute Andreia Rodrigues da Fonseca

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Protein Structures ◽

Visualization Tool ◽

Amino Acid Mutation ◽

Homologous Protein ◽

Codon Substitution ◽

Large Numbers ◽

Molecular Context ◽

Positively Selected Sites

ABSTRACTThe visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and to understand its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Z. Yang, 2000) are done in almost complete proteomes, generating large numbers of candidate proteins that require individual structural analyses. Here I present a python wrapper script that integrates the output of PAML with the PyMOL visualization tool to automate the generation of protein structure models where positively selected sites are mapped along with the location of putative functional domains.

Download Full-text

A Fast 3 x N Matrix Multiply Routine for Calculation of Protein RMSD

10.1101/008631 ◽

2014 ◽

Cited By ~ 2

Author(s):

Imran S Haque ◽

Kyle A Beauchamp ◽

Vijay S Pande

Keyword(s):

Root Mean Square ◽

Linear Algebra ◽

High Performance ◽

Protein Structures ◽

Matrix Product ◽

Mean Square ◽

Mean Square Deviation ◽

Large Numbers ◽

Rapid Calculation ◽

Atomic Coordinates

The bottleneck for the rapid calculation of the root-mean-square deviation in atomic coordinates (RMSD) between pairs of protein structures for large numbers of conformations is the evaluation of a (3xN) x (Nx3) matrix product over conformation pairs. Here we describe two matrix multiply routines specialized for the 3xN case that are able to significantly outperform (by up to 3X) off- the-shelf high-performance linear algebra libraries for this computation, reaching machine limits on performance. The routines are implemented in C and Python libraries, and are available at https://github.com/simtk/IRMSD.

Download Full-text

A distance geometry-based description and validation of protein main-chain conformation

IUCrJ ◽

10.1107/s2052252517008466 ◽

2017 ◽

Vol 4 (5) ◽

pp. 657-670 ◽

Cited By ~ 7

Author(s):

Joana Pereira ◽

Victor S. Lamzin

Keyword(s):

Main Chain ◽

Dimensional Space ◽

Distance Geometry ◽

Protein Structures ◽

Three Dimensional ◽

Principal Component ◽

Data Bank ◽

Conformational Space ◽

Protein Backbone ◽

Space Forms

Understanding the protein main-chain conformational space forms the basis for the modelling of protein structures and for the validation of models derived from structural biology techniques. Presented here is a novel idea for a three-dimensional distance geometry-based metric to account for the fine details of protein backbone conformations. The metrics are computed for dipeptide units, defined as blocks of Cαi−1—Oi−1—Cαi—Oi—Cαi+1atoms, by obtaining the eigenvalues of their Euclidean distance matrices. These were computed for ∼1.3 million dipeptide units collected from nonredundant good-quality structures in the Protein Data Bank and subjected to principal component analysis. The resulting new Euclidean orthogonal three-dimensional space (DipSpace) allows a probabilistic description of protein backbone geometry. The three axes of the DipSpace describe the local extension of the dipeptide unit structure, its twist and its bend. By using a higher-dimensional metric, the method is efficient for the identification of Cαatoms in an unlikely or unusual geometrical environment, and its use for both local and overall validation of protein models is demonstrated. It is also shown, for the example of trypsin proteases, that the detection of unusual conformations that are conserved among the structures of this protein family may indicate geometrically strained residues of potentially functional importance.

Download Full-text

DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet

10.26434/chemrxiv.11626098.v1 ◽

2020 ◽

Author(s):

Yifei Qi ◽

John Z.H. Zhang

Keyword(s):

Neural Network ◽

Protein Design ◽

Protein Sequence ◽

Protein Structures ◽

Three Dimensional ◽

Search Space ◽

Computational Protein Design ◽

Data Sets ◽

Protein Backbone ◽

Natural Amino Acids

<p>Computational protein design remains a challenging task despite its remarkable success in the past few decades. With the rapid progress of deep-learning techniques and the accumulation of three-dimensional protein structures, using deep neural networks to learn the relationship between protein sequences and structures and then automatically design a protein sequence for a given protein backbone structure is becoming increasingly feasible. In this study, we developed a deep neural network named DenseCPD that considers the three-dimensional density distribution of protein backbone atoms and predicts the probability of 20 natural amino acids for each residue in a protein. The accuracy of DenseCPD was 51.56±0.20% in a 5-fold cross validation on the training set and 54.45% and 50.06% on two independent test sets, which is more than 10% higher than those of previous state-of-the-art methods. Two approaches for using DenseCPD predictions in computational protein design were analyzed. The approach using the cutoff of accumulative probability had a smaller sequence search space compared to that of the approach that simply uses the top-k predictions and therefore enables higher sequence identity in redesigning three proteins with Rosetta. The network and the data sets are available on a web server at <a href="http://protein.org.cn/densecpd.html">http://protein.org.cn/densecpd.html</a>. The results of this study may benefit the further development of computational protein design methods.</p>

Download Full-text

Miscellaneous Tools

Constructing an Ethical Hacking Knowledge Base for Threat Awareness and Prevention ◽

10.4018/978-1-5225-7628-0.ch010 ◽

2019 ◽

pp. 258-277

Keyword(s):

Network Analysis ◽

Computer Security ◽

Web Application ◽

Social Engineering ◽

Analysis Tool ◽

Dos Attack ◽

Large Numbers ◽

Cain And Abel ◽

Session Hijacking ◽

Security Network

This chapter discusses different essential ethical hacking tools developed by various researchers in detail. Tools discussed here include Netcat network analysis tool, Macof from Dsniff suit toolset for DOS attack, Yersinia for dhcp starvation attack, Dnsspoof tool for MITM attacks, Ettercap for network-based attacks, Cain and Abel, Sslstrip tool, and SEToolkit. These tools are used for carrying out DOS attack, DHCP starvation attack, DNS spoofing attack, session hijacking attacks, social engineering attacks, and many other network-based attacks. Also, the detailed steps to configure WAMP server as part of ethical hacking lab setup is also discussed in this chapter in order to simulate web application-based attacks. There are large numbers of ethical hacking tools developed by the researchers working in this domain for computer security, network security, and web server security. This chapter discusses some of the essential tools in detail.

Download Full-text