PRIGSA2: Improved version of Protein Repeat Identification by Graph Spectral Analysis

Mapping Intimacies ◽

10.1101/803304 ◽

2019 ◽

Author(s):

Broto Chakrabarty ◽

Nita Parekh

Keyword(s):

Tertiary Structure ◽

De Novo ◽

Protein Complexes ◽

Repeat Unit ◽

Protein Structures ◽

Fold Increase ◽

Data Bank ◽

Topological Features ◽

Repeat Proteins ◽

Complete Protein

AbstractTandemly repeated structural motifs in proteins form highly stable structural folds and provide multiple binding sites associated with diverse functional roles. The tertiary structure and function of these proteins are determined by the type and copy number of the repeating units. Each repeat type exhibits a unique pattern of intra- and inter-repeat unit interactions that is well-captured by the topological features in the network representation of protein structures. Here we present an improved version of our graph based algorithm, PRIGSA, with structure-based validation and filtering steps incorporated for accurate detection of tandem structural repeats. The algorithm integrates available knowledge on repeat families with de novo prediction to detect repeats in single monomer chains as well as in multimeric protein complexes. Three levels of performance evaluation are presented: comparison with state-of-the-art algorithms on benchmark dataset of repeat and non-repeat proteins, accuracy in the detection of members of 13 known repeat families reported in UniProt and execution on the complete Protein Data Bank to show its ability to identify previously uncharacterized proteins. A ∼3-fold increase in the coverage of the members of 13 known families and 3,408 novel uncharacterized structural repeat proteins are identified on executing it on PDB. URL: http://bioinf.iiit.ac.in/PRIGSA2/.

Download Full-text

LZerD Protein-Protein Docking Webserver Enhanced With de novo Structure Prediction

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.724947 ◽

2021 ◽

Vol 8 ◽

Author(s):

Charles Christoffer ◽

Vijay Bharadwaj ◽

Ryan Luu ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Protein Complexes ◽

Protein Sequences ◽

Data Bank ◽

Protein Docking ◽

Functional Mechanisms ◽

Established Technique

Protein-protein docking is a useful tool for modeling the structures of protein complexes that have yet to be experimentally determined. Understanding the structures of protein complexes is a key component for formulating hypotheses in biophysics regarding the functional mechanisms of complexes. Protein-protein docking is an established technique for cases where the structures of the subunits have been determined. While the number of known structures deposited in the Protein Data Bank is increasing, there are still many cases where the structures of individual proteins that users want to dock are not determined yet. Here, we have integrated the AttentiveDist method for protein structure prediction into our LZerD webserver for protein-protein docking, which enables users to simply submit protein sequences and obtain full-complex atomic models, without having to supply any structure themselves. We have further extended the LZerD docking interface with a symmetrical homodimer mode. The LZerD server is available at https://lzerd.kiharalab.org/.

Download Full-text

Homology-based loop modeling yields more complete crystallographic protein structures

IUCrJ ◽

10.1107/s2052252518010552 ◽

2018 ◽

Vol 5 (5) ◽

pp. 585-594 ◽

Cited By ~ 14

Author(s):

Bart van Beusekom ◽

Krista Joosten ◽

Maarten L. Hekkelman ◽

Robbie P. Joosten ◽

Anastassis Perrakis

Keyword(s):

Protein Function ◽

Model Building ◽

Protein Structures ◽

Structural Models ◽

Data Bank ◽

Loop Modeling ◽

X Ray ◽

Density Maps ◽

Complete Protein ◽

Automated Procedures

Inherent protein flexibility, poor or low-resolution diffraction data or poorly defined electron-density maps often inhibit the building of complete structural models during X-ray structure determination. However, recent advances in crystallographic refinement and model building often allow completion of previously missing parts. This paper presents algorithms that identify regions missing in a certain model but present in homologous structures in the Protein Data Bank (PDB), and `graft' these regions of interest. These new regions are refined and validated in a fully automated procedure. Including these developments in the PDB-REDO pipeline has enabled the building of 24 962 missing loops in the PDB. The models and the automated procedures are publicly available through the PDB-REDO databank and webserver. More complete protein structure models enable a higher quality public archive but also a better understanding of protein function, better comparison between homologous structures and more complete data mining in structural bioinformatics projects.

Download Full-text

Protein shape sampled by ion mobility mass spectrometry consistently improves protein structure prediction

10.1101/2021.05.27.445812 ◽

2021 ◽

Author(s):

SM Bargeen Alam Turzo ◽

Justin Thomas Seffernick ◽

Amber D Rolland ◽

Micah T Donor ◽

Sten Heinze ◽

...

Keyword(s):

Mass Spectrometry ◽

Protein Structure ◽

Ion Mobility ◽

Structure Determination ◽

Tertiary Structure ◽

De Novo ◽

Structural Information ◽

Collision Cross Section ◽

Protein Structures ◽

Protein Shape

Among a wide variety of mass spectrometry (MS) methodologies available for structural characterizations of proteins, ion mobility (IM) provides structural information about protein shape and size in the form of an orientationally averaged collision cross-section (CCS). While IM data have been predominantly employed for the structural assessment of protein complexes, CCS data from IM experiments have not yet been used to predict tertiary structure from sequence. Here, we are showing that IM data can significantly improve protein structure determination using the modeling suite Rosetta. The Rosetta Projection Approximation using Rough Circular Shapes (PARCS) algorithm was developed that allows for fast and accurate prediction of CCS from structure. Following successful rigorous testing for accuracy, speed, and convergence of PARCS, an integrative modelling approach was developed in Rosetta to use CCS data from IM experiments. Using this method, we predicted protein structures from sequence for a benchmark set of 23 proteins. When using IM data, the predicted structure improved or remained unchanged for all 23 proteins, compared to the predicted models in the absence of CCS data. For 15/23 proteins, the RMSD (root-mean-square deviation) of the predicted model was less than 5.50 Å, compared to only 10/23 without IM data. We also developed a confidence metric that successfully identified near-native models in the absence of a native structure. These results demonstrate the ability of IM data in de novo structure determination.

Download Full-text

FilterDCA: interpretable supervised contact prediction using inter-domain coevolution

10.1101/2019.12.24.887877 ◽

2019 ◽

Cited By ~ 1

Author(s):

Maureen Muscat ◽

Giancarlo Croce ◽

Edoardo Sarti ◽

Martin Weigt

Keyword(s):

Deep Learning ◽

De Novo ◽

Protein Complexes ◽

Protein Structures ◽

Direct Coupling ◽

Sequence Information ◽

Coupling Analysis ◽

Contact Patterns ◽

Direct Coupling Analysis ◽

Training Sets

AbstractPredicting three-dimensional protein structure and assembling protein complexes using sequence information belongs to the most prominent tasks in computational biology. Recently substantial progress has been obtained in the case of single proteins using a combination of unsupervised coevolutionary sequence analysis with structurally supervised deep learning. While reaching impressive accuracies in predicting residue-residue contacts, deep learning has a number of disadvantages. The need for large structural training sets limits the applicability to multi-protein complexes; and their deep architecture makes the interpretability of the convolutional neural networks intrinsically hard. Here we introduce FilterDCA, a simpler supervised predictor for inter-domain and inter-protein contacts. It is based on the fact that contact maps of proteins show typical contact patterns, which results from secondary structure and are reflected by patterns in coevolutionary analysis. We explicitly integrate averaged contacts patterns with coevolutionary scores derived by Direct Coupling Analysis, reaching results comparable to more complex deep-learning approaches, while remaining fully transparent and interpretable. The FilterDCA code is available at http://gitlab.lcqb.upmc.fr/muscat/FilterDCA.Author summaryThe de novo prediction of tertiary and quaternary protein structures has recently seen important advances, by combining unsupervised, purely sequence-based coevolutionary analyses with structure-based supervision using deep learning for contact-map prediction. While showing impressive performance, deep-learning methods require large training sets and pose severe obstacles for their interpretability. Here we construct a simple, transparent and therefore fully interpretable inter-domain contact predictor, which uses the results of coevolutionary Direct Coupling Analysis in combination with explicitly constructed filters reflecting typical contact patterns in a training set of known protein structures, and which improves the accuracy of predicted contacts significantly. Our approach thereby sheds light on the question how contact information is encoded in coevolutionary signals.

Download Full-text

State-of-the-art web services for de novo protein structure prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbaa139 ◽

2020 ◽

Cited By ~ 1

Author(s):

Luciano A Abriata ◽

Matteo Dal Peraro

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Tertiary Structure ◽

De Novo ◽

State Of The Art ◽

Data Bank ◽

End Users ◽

Model Quality ◽

Uncharacterized Protein

Abstract Residue coevolution estimations coupled to machine learning methods are revolutionizing the ability of protein structure prediction approaches to model proteins that lack clear homologous templates in the Protein Data Bank (PDB). This has been patent in the last round of the Critical Assessment of Structure Prediction (CASP), which presented several very good models for the hardest targets. Unfortunately, literature reporting on these advances often lacks digests tailored to lay end users; moreover, some of the top-ranking predictors do not provide webservers that can be used by nonexperts. How can then end users benefit from these advances and correctly interpret the predicted models? Here we review the web resources that biologists can use today to take advantage of these state-of-the-art methods in their research, including not only the best de novo modeling servers but also datasets of models precomputed by experts for structurally uncharacterized protein families. We highlight their features, advantages and pitfalls for predicting structures of proteins without clear templates. We present a broad number of applications that span from driving forward biochemical investigations that lack experimental structures to actually assisting experimental structure determination in X-ray diffraction, cryo-EM and other forms of integrative modeling. We also discuss issues that must be considered by users yet still require further developments, such as global and residue-wise model quality estimates and sources of residue coevolution other than monomeric tertiary structure.

Download Full-text

Extreme stability in de novo-designed repeat arrays is determined by unusually stable short-range interactions

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1800283115 ◽

2018 ◽

Vol 115 (29) ◽

pp. 7539-7544 ◽

Cited By ~ 12

Author(s):

Kathryn Geiger-Schuller ◽

Kevin Sforza ◽

Max Yuhas ◽

Fabio Parmeggiani ◽

David Baker ◽

...

Keyword(s):

Protein Design ◽

Nearest Neighbor ◽

De Novo ◽

Protein Structures ◽

Free Energies ◽

Intrinsic Stability ◽

Repeat Proteins ◽

Naturally Occurring ◽

Wide Range ◽

The Individual

Designed helical repeats (DHRs) are modular helix–loop–helix–loop protein structures that are tandemly repeated to form a superhelical array. Structures combining tandem DHRs demonstrate a wide range of molecular geometries, many of which are not observed in nature. Understanding cooperativity of DHR proteins provides insight into the molecular origins of Rosetta-based protein design hyperstability and facilitates comparison of energy distributions in artificial and naturally occurring protein folds. Here, we use a nearest-neighbor Ising model to quantify the intrinsic and interfacial free energies of four different DHRs. We measure the folding free energies of constructs with varying numbers of internal and terminal capping repeats for four different DHR folds, using guanidine-HCl and glycerol as destabilizing and solubilizing cosolvents. One-dimensional Ising analysis of these series reveals that, although interrepeat coupling energies are within the range seen for naturally occurring repeat proteins, the individual repeats of DHR proteins are intrinsically stable. This favorable intrinsic stability, which has not been observed for naturally occurring repeat proteins, adds to stabilizing interfaces, resulting in extraordinarily high stability. Stable repeats also impart a downhill shape to the energy landscape for DHR folding. These intrinsic stability differences suggest that part of the success of Rosetta-based design results from capturing favorable local interactions.

Download Full-text

Ancient gene duplications in RNA viruses revealed by protein tertiary structure comparisons

Virus Evolution ◽

10.1093/ve/veab019 ◽

2021 ◽

Vol 7 (1) ◽

Author(s):

Alejandro Miguel Cisneros-Martínez ◽

Arturo Becerra ◽

Antonio Lazcano

Keyword(s):

Tertiary Structure ◽

Rna Viruses ◽

Rna Virus ◽

Protein Structures ◽

Data Bank ◽

Cysteine Proteases ◽

Gene Duplications ◽

Virus Family ◽

Protein Tertiary Structure ◽

Duplication Events

Abstract To date only a handful of duplicated genes have been described in RNA viruses. This shortage can be attributed to different factors, including the RNA viruses with high mutation rate that would make a large genome more prone to acquire deleterious mutations. This may explain why sequence-based approaches have only found duplications in their most recent evolutionary history. To detect earlier duplications, we performed protein tertiary structure comparisons for every RNA virus family represented in the Protein Data Bank. We present a list of thirty pairs of possible paralogs with <30 per cent sequence identity. It is argued that these pairs are the outcome of six duplication events. These include the α and β subunits of the fungal toxin KP6 present in the dsRNA Ustilago maydis virus (family Totiviridae), the SARS-CoV (Coronaviridae) nsp3 domains SUD-N, SUD-M and X-domain, the Picornavirales (families Picornaviridae, Dicistroviridae, Iflaviridae and Secoviridae) capsid proteins VP1, VP2 and VP3, and the Enterovirus (family Picornaviridae) 3C and 2A cysteine-proteases. Protein tertiary structure comparisons may reveal more duplication events as more three-dimensional protein structures are determined and suggests that, although still rare, gene duplications may be more frequent in RNA viruses than previously thought. Keywords: gene duplications; RNA viruses.

Download Full-text

Base-intercalated and base-wedged stacking elements in 3D-structure of RNA and RNA–protein complexes

Nucleic Acids Research ◽

10.1093/nar/gkaa610 ◽

2020 ◽

Vol 48 (15) ◽

pp. 8675-8685

Author(s):

Eugene Baulin ◽

Valeriy Metelev ◽

Alexey Bogdanov

Keyword(s):

Tertiary Structure ◽

Protein Complexes ◽

3D Structure ◽

Data Bank ◽

Rna Structures ◽

Tertiary Structures ◽

Long Range Interactions ◽

Heterocyclic Bases ◽

Rna Protein Complexes

Abstract Along with nucleobase pairing, base-base stacking interactions are one of the two main types of strong non-covalent interactions that define the unique secondary and tertiary structure of RNA. In this paper we studied two subfamilies of nucleobase-inserted stacking structures: (i) with any base intercalated between neighboring nucleotide residues (base-intercalated element, BIE, i + 1); (ii) with any base wedged into a hydrophobic cavity formed by heterocyclic bases of two nucleotides which are one nucleotide apart in sequence (base-wedged element, BWE, i + 2). We have exploited the growing database of natively folded RNA structures in Protein Data Bank to analyze the distribution and structural role of these motifs in RNA. We found that these structural elements initially found in yeast tRNAPhe are quite widespread among the tertiary structures of various RNAs. These motifs perform diverse roles in RNA 3D structure formation and its maintenance. They contribute to the folding of RNA bulges and loops and participate in long-range interactions of single-stranded stretches within RNA macromolecules. Furthermore, both base-intercalated and base-wedged motifs participate directly or indirectly in the formation of RNA functional centers, which interact with various ligands, antibiotics and proteins.

Download Full-text

PROCARB: A Database of Known and Modelled Carbohydrate-Binding Protein Structures with Sequence-Based Prediction Tools

Advances in Bioinformatics ◽

10.1155/2010/436036 ◽

2010 ◽

Vol 2010 ◽

pp. 1-9 ◽

Cited By ~ 14

Author(s):

Adeel Malik ◽

Ahmad Firoz ◽

Vivekanand Jha ◽

Shandar Ahmad

Keyword(s):

Binding Sites ◽

De Novo ◽

Solvent Accessibility ◽

Protein Structures ◽

Three Dimensional ◽

Data Bank ◽

Dimensional Structure ◽

Carbohydrate Binding ◽

Structural And Functional Properties ◽

Three Dimensional Models

Understanding of the three-dimensional structures of proteins that interact with carbohydrates covalently (glycoproteins) as well as noncovalently (protein-carbohydrate complexes) is essential to many biological processes and plays a significant role in normal and disease-associated functions. It is important to have a central repository of knowledge available about these protein-carbohydrate complexes as well as preprocessed data of predicted structures. This can be significantly enhanced by tools de novo which can predict carbohydrate-binding sites for proteins in the absence of structure of experimentally known binding site. PROCARB is an open-access database comprising three independently working components, namely, (i) Core PROCARB module, consisting of three-dimensional structures of protein-carbohydrate complexes taken from Protein Data Bank (PDB), (ii) Homology Models module, consisting of manually developed three-dimensional models of N-linked and O-linked glycoproteins of unknown three-dimensional structure, and (iii) CBS-Pred prediction module, consisting of web servers to predict carbohydrate-binding sites using single sequence or server-generated PSSM. Several precomputed structural and functional properties of complexes are also included in the database for quick analysis. In particular, information about function, secondary structure, solvent accessibility, hydrogen bonds and literature reference, and so forth, is included. In addition, each protein in the database is mapped to Uniprot, Pfam, PDB, and so forth.

Download Full-text

Synthetic beta-solenoid proteins with the fragment-free computational design of a beta-hairpin extension

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1525308113 ◽

2016 ◽

Vol 113 (37) ◽

pp. 10346-10351 ◽

Cited By ~ 16

Author(s):

James T. MacDonald ◽

Burak V. Kabasakal ◽

David Godding ◽

Sebastian Kraatz ◽

Louie Henderson ◽

...

Keyword(s):

Crystal Structure ◽

De Novo ◽

Repeat Unit ◽

Computational Design ◽

Atomic Level ◽

Repeat Family ◽

Repeat Proteins ◽

Synthetic Protein ◽

Beta Hairpin ◽

Repetitive Protein

The ability to design and construct structures with atomic level precision is one of the key goals of nanotechnology. Proteins offer an attractive target for atomic design because they can be synthesized chemically or biologically and can self-assemble. However, the generalized protein folding and design problem is unsolved. One approach to simplifying the problem is to use a repetitive protein as a scaffold. Repeat proteins are intrinsically modular, and their folding and structures are better understood than large globular domains. Here, we have developed a class of synthetic repeat proteins based on the pentapeptide repeat family of beta-solenoid proteins. We have constructed length variants of the basic scaffold and computationally designed de novo loops projecting from the scaffold core. The experimentally solved 3.56-Å resolution crystal structure of one designed loop matches closely the designed hairpin structure, showing the computational design of a backbone extension onto a synthetic protein core without the use of backbone fragments from known structures. Two other loop designs were not clearly resolved in the crystal structures, and one loop appeared to be in an incorrect conformation. We have also shown that the repeat unit can accommodate whole-domain insertions by inserting a domain into one of the designed loops.

Download Full-text