Comparison of proteins based on segments structural similarity.

Dariusz Plewczynski; Jakub Pas; Marcin Von Grotthuss; Leszek Rychlewski

doi:10.18388/abp.2004_3608

Comparison of proteins based on segments structural similarity.

Acta Biochimica Polonica ◽

10.18388/abp.2004_3608 ◽

2004 ◽

Vol 51 (1) ◽

pp. 161-172 ◽

Cited By ~ 8

Author(s):

Dariusz Plewczynski ◽

Jakub Pas ◽

Marcin Von Grotthuss ◽

Leszek Rychlewski

Keyword(s):

Structure Prediction ◽

Structural Alignment ◽

Structural Similarity ◽

Data Bank ◽

Simple Method ◽

Accurate Comparison ◽

Scop Classification ◽

Prediction Evaluation ◽

Dali Server ◽

Speed And Accuracy

We present here a simple method for fast and accurate comparison of proteins using their structures. The algorithm is based on structural alignment of segments of Calpha chains (with size of 99 or 199 residues). The method is optimized in terms of speed and accuracy. We test it on 97 representative proteins with the similarity measure based on the SCOP classification. We compare our algorithm with the LGscore2 automatic method. Our method has the same accuracy as the LGscore2 algorithm with much faster processing of the whole test set, which is promising. A second test is done using the ToolShop structure prediction evaluation program and shows that our tool is on average slightly less sensitive than the DALI server. Both algorithms give a similar number of correct models, however, the final alignment quality is better in the case of DALI. Our method was implemented under the name 3D-Hit as a web server at http://3dhit.bioinfo.pl/ free for academic use, with a weekly updated database containing a set of 5000 structures from the Protein Data Bank with non-homologous sequences.

Download Full-text

SARS-CoV-2 ribosomal frameshifting pseudoknot: Improved secondary structure prediction and detection of inter-viral structural similarity

10.1101/2020.09.15.298604 ◽

2020 ◽

Author(s):

Luke Trinity ◽

Lance Lansing ◽

Hosna Jabbari ◽

Ulrike Stege

Keyword(s):

Rna Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Structural Alignment ◽

Antiviral Agents ◽

Herd Immunity ◽

Structural Similarity ◽

Ribosomal Frameshifting ◽

Novel Coronavirus ◽

The Impact

AbstractSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has led to the COVID-19 pandemic; a pandemic of a scale that has not been seen in the modern era. Despite over 29 million reported cases and over 900, 000 deaths worldwide as of September 2020, herd immunity and widespread vaccination efforts by many experts are expected to be insufficient in addressing this crisis for the foreseeable future. Thus, there is an urgent need for treatments that can lessen the effects of SARS-CoV-2 in patients who become seriously affected. Many viruses including HIV, the common cold, SARS-CoV and SARS-CoV-2 use a unique mechanism known as −1 programmed ribosomal frameshifting (−1 PRF) to successfully replicate and infect cells in the human host. SARS-CoV (the coronavirus responsible for SARS) and SARS-CoV-2 possess a unique RNA structure, a three-stemmed pseudoknot, that stimulates −1 PRF. Recent experiments identified that small molecules can be introduced as antiviral agents to bind with the pseudoknot and disrupt its stimulation of −1 PRF. If successfully developed, small molecule therapy that targets −1 PRF in SARS-CoV-2 is an excellent strategy to improve patients’ prognoses. Crucial to developing these successful therapies is modeling the structure of the SARS-CoV-2 −1 PRF pseudoknot. Following a structural alignment approach, we identify similarities in the −1 PRF pseudoknots of the novel coronavirus SARS-CoV-2, the original SARS-CoV, as well as a third coronavirus: MERS-CoV, the coronavirus responsible for Middle East Respiratory Syndrome (MERS). In addition, we provide a better understanding of the SARS-CoV-2 −1 PRF pseudoknot by comprehensively investigating the structural landscape using a hierarchical folding approach. Since understanding the impact of mutations is vital to long-term success of treatments that are based on predicted RNA functional structures, we provide insight on SARS-CoV-2 −1 PRF pseudoknot sequence mutations and their effect on the resulting structure and its function.

Download Full-text

In silico Discovery of Resveratrol Analogues as Potential Agents in Treatment of Metabolic Disorders

Current Pharmaceutical Design ◽

10.2174/1381612825666191029095252 ◽

2019 ◽

Vol 25 (35) ◽

pp. 3776-3783

Author(s):

Nebojša Pavlović ◽

Maja Đanić ◽

Bojan Stanimirov ◽

Svetlana Goločorbin-Kon ◽

Karmen Stankov ◽

...

Keyword(s):

In Silico ◽

Metabolic Disorders ◽

Partial Agonist ◽

Aqueous Solubility ◽

Structural Similarity ◽

Data Bank ◽

Docking Studies ◽

Binding Energies ◽

Molegro Virtual Docker ◽

Ppar Γ

Background: Resveratrol was demonstrated to act as partial agonist of PPAR-γ receptor, which opens up the possibility for its use in the treatment of metabolic disorders. Considering the poor bioavailability of resveratrol, particularly due to its low aqueous solubility, we aimed to identify analogues of resveratrol with improved pharmacokinetic properties and higher binding affinities towards PPAR-γ. Methods: 3D structures of resveratrol and its analogues were retrieved from ZINC database, while PPAR-γ structure was obtained from Protein Data Bank. Docking studies were performed using Molegro Virtual Docker software. Molecular descriptors relevant to pharmacokinetics were calculated from ligand structures using VolSurf+ software. Results: Using structural similarity search method, 56 analogues of resveratrol were identified and subjected to docking analyses. Binding energies were ranged from -136.69 to -90.89 kcal/mol, with 16 analogues having higher affinities towards PPAR-γ in comparison to resveratrol. From the calculated values of SOLY descriptor, 23 studied compounds were shown to be more soluble in water than resveratrol. However, only two tetrahydroxy stilbene derivatives, piceatannol and oxyresveratrol, had both better solubility and affinity towards PPAR-γ. These compounds also had more favorable ADME profile, since they were shown to be more metabolically stable and wider distributed in body than resveratrol. Conclusion: Piceatannol and oxyresveratrol should be considered as potential lead compounds for further drug development. Although experimental validation of obtained in silico results is required, this work can be considered as a step toward the discovery of new natural and safe drugs in treatment of metabolic disorders.

Download Full-text

Expanding our knowledge of the protein universe: Modelling of protein structures

Acta Crystallographica Section A Foundations and Advances ◽

10.1107/s2053273314095084 ◽

2014 ◽

Vol 70 (a1) ◽

pp. C491-C491

Author(s):

Jürgen Haas ◽

Alessandro Barbato ◽

Tobias Schmidt ◽

Steven Roth ◽

Andrew Waterhouse ◽

...

Keyword(s):

Computational Modeling ◽

Structure Prediction ◽

Structural Information ◽

Protein Structures ◽

Model Organism ◽

Data Bank ◽

Continuous Model ◽

Structure Modeling ◽

Structure Comparison ◽

Modeling And Prediction

Computational modeling and prediction of three-dimensional macromolecular structures and complexes from their sequence has been a long standing goal in structural biology. Over the last two decades, a paradigm shift has occurred: starting from a large "knowledge gap" between the huge number of protein sequences compared to a small number of experimentally known structures, today, some form of structural information – either experimental or computational – is available for the majority of amino acids encoded by common model organism genomes. Methods for structure modeling and prediction have made substantial progress of the last decades, and template based homology modeling techniques have matured to a point where they are now routinely used to complement experimental techniques. However, computational modeling and prediction techniques often fall short in accuracy compared to high-resolution experimental structures, and it is often difficult to convey the expected accuracy and structural variability of a specific model. Retrospectively assessing the quality of blind structure prediction in comparison to experimental reference structures allows benchmarking the state-of-the-art in structure prediction and identifying areas which need further development. The Critical Assessment of Structure Prediction (CASP) experiment has for the last 20 years assessed the progress in the field of protein structure modeling based on predictions for ca. 100 blind prediction targets per experiment which are carefully evaluated by human experts. The "Continuous Model EvaluatiOn" (CAMEO) project aims to provide a fully automated blind assessment for prediction servers based on weekly pre-released sequences of the Protein Data Bank PDB. CAMEO has been made possible by the development of novel scoring methods such as lDDT, which are robust against domain movements to allow for automated continuous structure comparison without human intervention.

Download Full-text

AlphaFold at CASP13

Bioinformatics ◽

10.1093/bioinformatics/btz422 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4862-4865 ◽

Cited By ~ 48

Author(s):

Mohammed AlQuraishi

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Computational Prediction ◽

Data Bank ◽

Academic Community ◽

Physical Contact ◽

Evolutionary Analysis ◽

History Of ◽

First Time

Abstract Summary: Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.

Download Full-text

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Nucleic Acids Research ◽

10.1093/nar/gkaa1097 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D452-D457

Author(s):

Lisanna Paladin ◽

Martina Bevilacqua ◽

Sara Errigo ◽

Damiano Piovesan ◽

Ivan Mičetić ◽

...

Keyword(s):

Protein Data Bank ◽

Tandem Repeat ◽

Tandem Repeats ◽

Classification Scheme ◽

Sequence Similarity ◽

Protein Structures ◽

Hierarchical Classification ◽

Structural Similarity ◽

Data Bank ◽

Similarity Class

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

Download Full-text

Sequence alignment using machine learning for accurate template-based protein structure prediction

Bioinformatics ◽

10.1093/bioinformatics/btz483 ◽

2019 ◽

Vol 36 (1) ◽

pp. 104-111

Author(s):

Shuichiro Makigaki ◽

Takashi Ishida

Keyword(s):

Machine Learning ◽

Structure Prediction ◽

Tertiary Structure ◽

Structural Alignment ◽

Protein Structures ◽

Substitution Matrix ◽

Detection Methods ◽

Supplementary Information ◽

Homology Detection ◽

Sequence Alignments

Abstract Motivation Template-based modeling, the process of predicting the tertiary structure of a protein by using homologous protein structures, is useful if good templates can be found. Although modern homology detection methods can find remote homologs with high sensitivity, the accuracy of template-based models generated from homology-detection-based alignments is often lower than that from ideal alignments. Results In this study, we propose a new method that generates pairwise sequence alignments for more accurate template-based modeling. The proposed method trains a machine learning model using the structural alignment of known homologs. It is difficult to directly predict sequence alignments using machine learning. Thus, when calculating sequence alignments, instead of a fixed substitution matrix, this method dynamically predicts a substitution score from the trained model. We evaluate our method by carefully splitting the training and test datasets and comparing the predicted structure’s accuracy with that of state-of-the-art methods. Our method generates more accurate tertiary structure models than those produced from alignments obtained by other methods. Availability and implementation https://github.com/shuichiro-makigaki/exmachina. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

A murrel cysteine protease, cathepsin L: bioinformatics characterization, gene expression and proteolytic activity

Biologia ◽

10.2478/s11756-013-0326-8 ◽

2014 ◽

Vol 69 (3) ◽

Cited By ~ 11

Author(s):

Venkatesh Kumaresan ◽

Prasanth Bhatt ◽

Rajesh Palanisamy ◽

Annie Gnanam ◽

Mukesh Pasupuleti ◽

...

Keyword(s):

Gene Expression ◽

Bacterial Infections ◽

Cathepsin L ◽

Three Dimensional ◽

Structural Similarity ◽

Data Bank ◽

Minimum Free Energy ◽

Dimensional Structure ◽

Peptide Bonds ◽

Gene Expression Studies

AbstractCathepsin L, a lysosomal endopeptidase, is a member of the peptidase C1 family (papain-like family) of cysteine proteinases that cleave peptide bonds of lysosomal proteins. In this study, we report a cathepsin L sequence identified from the constructed cDNA library of striped murrel Channa striatus (designated as CsCath L) using genome sequencing FLXTM technology. The full-length CsCath L contains three eukaryotic thiol protease domains at positions 134-145, 278-288 and 299-318. Phylogenetic analysis revealed that the CsCath L was clustered together with other cathepsin L from teleosts. The three-dimensional structure of CsCath L modelled by the I-Tasser program was compared with structures deposited in the Protein Data Bank to find out the structural similarity of CsCath L with experimentally identified structures. The results showed that the CsCath L exhibits maximum structural identity with pro-cathepsin L from human. The RNA fold structure of CsCath L was predicted along with its minimum free energy (−471.93 kcal/mol). The highest CsCath L gene expression was observed in liver, which was also significantly higher (P < 0.05) than that detected in other tissues taken for analysis. In order to investigate the mRNA transcription profile of CsCath L during infection, C. striatus were injected with fungus (Aphanomyces invadans) and bacteria (Aeromonas hydrophila) and its expression was up-regulated in liver at various time points. Similar to gene expression studies, the highest CsCath L enzyme activity was also observed in liver and its activity was up-regulated by fungal and bacterial infections.

Download Full-text

Secondary structure prediction of 52 membrane-bound cytochromes P450 shows a strong structural similarity to P450cam

Biochemistry ◽

10.1021/bi00428a036 ◽

1989 ◽

Vol 28 (2) ◽

pp. 656-660 ◽

Cited By ~ 119

Author(s):

David R. Nelson ◽

Henry W. Strobel

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Cytochromes P450 ◽

Structural Similarity ◽

Membrane Bound

Download Full-text

LZerD Protein-Protein Docking Webserver Enhanced With de novo Structure Prediction

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.724947 ◽

2021 ◽

Vol 8 ◽

Author(s):

Charles Christoffer ◽

Vijay Bharadwaj ◽

Ryan Luu ◽

Daisuke Kihara

Keyword(s):

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

De Novo ◽

Protein Complexes ◽

Protein Sequences ◽

Data Bank ◽

Protein Docking ◽

Functional Mechanisms ◽

Established Technique

Protein-protein docking is a useful tool for modeling the structures of protein complexes that have yet to be experimentally determined. Understanding the structures of protein complexes is a key component for formulating hypotheses in biophysics regarding the functional mechanisms of complexes. Protein-protein docking is an established technique for cases where the structures of the subunits have been determined. While the number of known structures deposited in the Protein Data Bank is increasing, there are still many cases where the structures of individual proteins that users want to dock are not determined yet. Here, we have integrated the AttentiveDist method for protein structure prediction into our LZerD webserver for protein-protein docking, which enables users to simply submit protein sequences and obtain full-complex atomic models, without having to supply any structure themselves. We have further extended the LZerD docking interface with a symmetrical homodimer mode. The LZerD server is available at https://lzerd.kiharalab.org/.

Download Full-text

Fast and adaptive protein structure representations for machine learning

10.1101/2021.04.07.438777 ◽

2021 ◽

Author(s):

Janani Durairaj ◽

Mehmet Akdel ◽

Dick de Ridder ◽

Aalt D.J. van Dijk

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Structural Alignment ◽

Structural Similarity ◽

Structural Features ◽

Structure Alignment ◽

Learning Tasks ◽

Alignment Free ◽

Functional Hierarchy ◽

Invariant Shape

The growing prevalence and popularity of protein structure data, both experimental and computationally modelled, necessitates fast tools and algorithms to enable exploratory and interpretable structure-based machine learning. Alignment-free approaches have been developed for divergent proteins, but proteins sharing functional and structural similarity are often better understood via structural alignment, which has typically been too computationally expensive for larger datasets. Here, we introduce the concept of rotation-invariant shape-mers to multiple structure alignment, creating a structure aligner that scales well with the number of proteins and allows for aligning over a thousand structures in 20 minutes. We demonstrate how alignment-free shape-mer counts and aligned structural features, when used in machine learning tasks, can adapt to different levels of functional hierarchy in protein kinases, pinpointing residues and structural fragments that play a role in catalytic activity.

Download Full-text