Structural Phylogenetics with Confidence

Ashar J Malik; Anthony M Poole; Jane R Allison

doi:10.1093/molbev/msaa100

Structural Phylogenetics with Confidence

Molecular Biology and Evolution ◽

10.1093/molbev/msaa100 ◽

2020 ◽

Vol 37 (9) ◽

pp. 2711-2726

Author(s):

Ashar J Malik ◽

Anthony M Poole ◽

Jane R Allison

Keyword(s):

Molecular Dynamics ◽

Sequence Similarity ◽

Protein Structures ◽

Structural Data ◽

Data Bank ◽

Homology Search ◽

Hierarchical Level ◽

Evolutionary Relationships ◽

Major Protein ◽

Alternative Means

Abstract For evaluating the deepest evolutionary relationships among proteins, sequence similarity is too low for application of sequence-based homology search or phylogenetic methods. In such cases, comparison of protein structures, which are often better conserved than sequences, may provide an alternative means of uncovering deep evolutionary signal. Although major protein structure databases such as SCOP and CATH hierarchically group protein structures, they do not describe the specific evolutionary relationships within a hierarchical level. Structural phylogenies have the potential to fill this gap. However, it is difficult to assess evolutionary relationships derived from structural phylogenies without some means of assessing confidence in such trees. We therefore address two shortcomings in the application of structural data to deep phylogeny. First, we examine whether phylogenies derived from pairwise structural comparisons are sensitive to differences in protein length and shape. We find that structural phylogenetics is best employed where structures have very similar lengths, and that shape fluctuations generated during molecular dynamics simulations impact pairwise comparisons, but not so drastically as to eliminate evolutionary signal. Second, we address the absence of statistical support for structural phylogeny. We present a method for assessing confidence in a structural phylogeny using shape fluctuations generated via molecular dynamics or Monte Carlo simulations of proteins. Our approach will aid the evolutionary reconstruction of relationships across structurally defined protein superfamilies. With the Protein Data Bank now containing in excess of 158,000 entries (December 2019), we predict that structural phylogenetics will become a useful tool for ordering the protein universe.

Download Full-text

Dimer Interface Organization is a Main Determinant of Intermonomeric Interactions and Correlates with Evolutionary Relationships of Retroviral and Retroviral-Like Ddi1 and Ddi2 Proteases

International Journal of Molecular Sciences ◽

10.3390/ijms21041352 ◽

2020 ◽

Vol 21 (4) ◽

pp. 1352 ◽

Cited By ~ 3

Author(s):

János András Mótyán ◽

Márió Miczi ◽

József Tőzsér

Keyword(s):

Structural Characteristics ◽

Limited Proteolysis ◽

Structural Data ◽

Data Bank ◽

Life Cycles ◽

Evolutionary Relationships ◽

Multiple Sequence ◽

Dimer Interface ◽

Viral Proteases ◽

Eukaryotic Organisms

The life cycles of retroviruses rely on the limited proteolysis catalyzed by the viral protease. Numerous eukaryotic organisms also express endogenously such proteases, which originate from retrotransposons or retroviruses, including DNA damage-inducible 1 and 2 (Ddi1 and Ddi2, respectively) proteins. In this study, we performed a comparative analysis based on the structural data currently available in Protein Data Bank (PDB) and Structural summaries of PDB entries (PDBsum) databases, with a special emphasis on the regions involved in dimerization of retroviral and retroviral-like Ddi proteases. In addition to Ddi1 and Ddi2, at least one member of all seven genera of the Retroviridae family was included in this comparison. We found that the studied retroviral and non-viral proteases show differences in the mode of dimerization and density of intermonomeric contacts, and distribution of the structural characteristics is in agreement with their evolutionary relationships. Multiple sequence and structure alignments revealed that the interactions between the subunits depend mainly on the overall organization of the dimer interface. We think that better understanding of the general and specific features of proteases may support the characterization of retroviral-like proteases.

Download Full-text

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Nucleic Acids Research ◽

10.1093/nar/gkaa1097 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D452-D457

Author(s):

Lisanna Paladin ◽

Martina Bevilacqua ◽

Sara Errigo ◽

Damiano Piovesan ◽

Ivan Mičetić ◽

...

Keyword(s):

Protein Data Bank ◽

Tandem Repeat ◽

Tandem Repeats ◽

Classification Scheme ◽

Sequence Similarity ◽

Protein Structures ◽

Hierarchical Classification ◽

Structural Similarity ◽

Data Bank ◽

Similarity Class

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

Download Full-text

xMDFF: molecular dynamics flexible fitting of low-resolution X-ray structures

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s1399004714013856 ◽

2014 ◽

Vol 70 (9) ◽

pp. 2344-2355 ◽

Cited By ~ 33

Author(s):

Ryan McGreevy ◽

Abhishek Singharoy ◽

Qufei Li ◽

Jingfen Zhang ◽

Dong Xu ◽

...

Keyword(s):

Molecular Dynamics ◽

Large Scale ◽

Protein Structures ◽

Data Bank ◽

Real Space ◽

Low Resolution ◽

X Ray ◽

X Ray Crystallography ◽

Atomic Structures ◽

Electron Density Map

X-ray crystallography remains the most dominant method for solving atomic structures. However, for relatively large systems, the availability of only medium-to-low-resolution diffraction data often limits the determination of all-atom details. A new molecular dynamics flexible fitting (MDFF)-based approach, xMDFF, for determining structures from such low-resolution crystallographic data is reported. xMDFF employs a real-space refinement scheme that flexibly fits atomic models into an iteratively updating electron-density map. It addresses significant large-scale deformations of the initial model to fit the low-resolution density, as tested with synthetic low-resolution maps of D-ribose-binding protein. xMDFF has been successfully applied to re-refine six low-resolution protein structures of varying sizes that had already been submitted to the Protein Data Bank. Finally,viasystematic refinement of a series of data from 3.6 to 7 Å resolution, xMDFF refinements together with electrophysiology experiments were used to validate the first all-atom structure of the voltage-sensing protein Ci-VSP.

Download Full-text

SCOP, Structural Classification of Proteins Database: Applications to Evaluation of the Effectiveness of Sequence Alignment Methods and Statistics of Protein Structural Data

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444998009172 ◽

1998 ◽

Vol 54 (6) ◽

pp. 1147-1154 ◽

Cited By ~ 22

Author(s):

Tim J. P. Hubbard ◽

Bart Ailey ◽

Steven E. Brenner ◽

Alexey G. Murzin ◽

Cyrus Chothia

Keyword(s):

Protein Structures ◽

Structural Data ◽

Evolutionary Relationships ◽

Structural Classification ◽

Comprehensive Description ◽

Database Applications ◽

The Third ◽

Sequence Search ◽

Structural Classification Of Proteins

The Structural Classification of Proteins (SCOP) database provides a detailed and comprehensive description of the relationships of all known protein structures. The classification is on hierarchical levels: the first two levels, family and superfamily, describe near and far evolutionary relationships; the third, fold, describes geometrical relationships. The distinction between evolutionary relationships and those that arise from the physics and chemistry of proteins is a feature that is unique to this database, so far. The database can be used as a source of data to calibrate sequence search algorithms and for the generation of population statistics on protein structures. The database and its associated files are freely accessible from a number of WWW sites mirrored from URL http://scop.mrc-lmb.cam.ac.uk/scop/.

Download Full-text

Learning structural motif representations for efficient protein structure search

10.1101/137828 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yang Liu ◽

Qing Ye ◽

Liwei Wang ◽

Jian Peng

Keyword(s):

Protein Structure ◽

Fundamental Problem ◽

Sequence Similarity ◽

Structural Alignment ◽

Protein Structures ◽

Computational Cost ◽

Data Bank ◽

Hierarchical Organization ◽

Structural Motif ◽

And Function

AbstractMotivationUnderstanding the relationship between protein structure and function is a fundamental problem in protein science. Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a “bag of fragments”, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library.ResultsHere we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. Similar to FragBag, DeepFold represents each protein structure or fold using a vector of learned structural motif features. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.Availabilityhttps://github.com/largelymfs/[email protected]

Download Full-text

Parameterization of Divalent Cations for Coarse-Grained Simulations

10.26434/chemrxiv.11881716 ◽

2020 ◽

Author(s):

Florencia Klein ◽

Daniela Cáceres-Rojas ◽

Monica Carrasco ◽

Juan Carlos Tapia ◽

Julio Caballero ◽

...

Keyword(s):

Molecular Dynamics ◽

Metal Ions ◽

Molecular Dynamics Simulations ◽

Divalent Cations ◽

Computational Cost ◽

Data Bank ◽

Coarse Grained ◽

Interaction Parameters ◽

Dynamics Simulations ◽

Dynamical Description

<p>Although molecular dynamics simulations allow for the study of interactions among virtually all biomolecular entities, metal ions still pose significant challenges to achieve an accurate structural and dynamical description of many biological assemblies. This is particularly the case for coarse-grained (CG) models. Although the reduced computational cost of CG methods often makes them the technique of choice for the study of large biomolecular systems, the parameterization of metal ions is still very crude or simply not available for the vast majority of CG- force fields. Here, we show that incorporating statistical data retrieved from the Protein Data Bank (PDB) to set specific Lennard-Jones interactions can produce structurally accurate CG molecular dynamics simulations. Using this simple approach, we provide a set of interaction parameters for Calcium, Magnesium, and Zinc ions, which cover more than 80% of the metal-bound structures reported on the PDB. Simulations performed using the SIRAH force field on several proteins and DNA systems show that using the present approach it is possible to obtain non-bonded interaction parameters that obviate the use of topological constraints. </p>

Download Full-text

Improved Sampling Strategies for Protein Model Refinement based on Molecular Dynamics Simulation

10.26434/chemrxiv.13299197.v1 ◽

2020 ◽

Author(s):

Lim Heo ◽

Collin Arbour ◽

Michael Feig

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Structure Prediction ◽

Protein Structures ◽

Conformational Space ◽

Dynamics Simulation ◽

Model Refinement ◽

Protein Model ◽

Lower Accuracy ◽

Simulation Based

Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. Those methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on an optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore conformational space more broadly. Based on the insight of this analysis we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here. <br>

Download Full-text

Structure Unveils Relationships between RNA Virus Polymerases

Viruses ◽

10.3390/v13020313 ◽

2021 ◽

Vol 13 (2) ◽

pp. 313

Author(s):

Heli A. M. Mönttinen ◽

Janne J. Ravantti ◽

Minna M. Poranen

Keyword(s):

Phylogenetic Tree ◽

Rna Viruses ◽

Rna Virus ◽

Sequence Similarity ◽

Protein Structures ◽

Structural Similarity ◽

Functional Differentiation ◽

Comparison Method ◽

Homologous Structure ◽

Biological Entities

RNA viruses are the fastest evolving known biological entities. Consequently, the sequence similarity between homologous viral proteins disappears quickly, limiting the usability of traditional sequence-based phylogenetic methods in the reconstruction of relationships and evolutionary history among RNA viruses. Protein structures, however, typically evolve more slowly than sequences, and structural similarity can still be evident, when no sequence similarity can be detected. Here, we used an automated structural comparison method, homologous structure finder, for comprehensive comparisons of viral RNA-dependent RNA polymerases (RdRps). We identified a common structural core of 231 residues for all the structurally characterized viral RdRps, covering segmented and non-segmented negative-sense, positive-sense, and double-stranded RNA viruses infecting both prokaryotic and eukaryotic hosts. The grouping and branching of the viral RdRps in the structure-based phylogenetic tree follow their functional differentiation. The RdRps using protein primer, RNA primer, or self-priming mechanisms have evolved independently of each other, and the RdRps cluster into two large branches based on the used transcription mechanism. The structure-based distance tree presented here follows the recently established RdRp-based RNA virus classification at genus, subfamily, family, order, class and subphylum ranks. However, the topology of our phylogenetic tree suggests an alternative phylum level organization.

Download Full-text

Molecular dynamics simulation combined with small‐angle X‐ray/neutron scattering defining solution‐state protein structures

Journal of the Chinese Chemical Society ◽

10.1002/jccs.202000498 ◽

2020 ◽

Author(s):

Shang‐Wei Lin ◽

Kuan‐Hsuan Su ◽

Yi‐Qi Yeh ◽

U‐Ser Jeng ◽

Chun‐Ming Wu ◽

...

Keyword(s):

Molecular Dynamics ◽

Molecular Dynamics Simulation ◽

Neutron Scattering ◽

Small Angle ◽

Protein Structures ◽

Dynamics Simulation ◽

X Ray ◽

Solution State

Download Full-text

Studies of Conformational Changes of Tubulin Induced by Interaction with Kinesin Using Atomistic Molecular Dynamics Simulations

International Journal of Molecular Sciences ◽

10.3390/ijms22136709 ◽

2021 ◽

Vol 22 (13) ◽

pp. 6709

Author(s):

Xiao-Xuan Shi ◽

Peng-Ye Wang ◽

Hong Chen ◽

Ping Xie

Keyword(s):

Molecular Dynamics ◽

High Resolution ◽

Binding Energy ◽

Molecular Dynamics Simulations ◽

Conformational Changes ◽

Weak Interactions ◽

Molecular Motor ◽

Structural Data ◽

Calculated Binding Energy ◽

Dynamics Simulations

The transition between strong and weak interactions of the kinesin head with the microtubule, which is regulated by the change of the nucleotide state of the head, is indispensable for the processive motion of the kinesin molecular motor on the microtubule. Here, using all-atom molecular dynamics simulations, the interactions between the kinesin head and tubulin are studied on the basis of the available high-resolution structural data. We found that the strong interaction can induce rapid large conformational changes of the tubulin, whereas the weak interaction cannot. Furthermore, we found that the large conformational changes of the tubulin have a significant effect on the interaction of the tubulin with the head in the weak-microtubule-binding ADP state. The calculated binding energy of the ADP-bound head to the tubulin with the large conformational changes is only about half that of the tubulin without the conformational changes.

Download Full-text