Evaluations of protein sequence alignments using structural information

Pre-training of Deep Bidirectional Protein Sequence Representations with Structural Information

IEEE Access ◽

10.1109/access.2021.3110269 ◽

2021 ◽

pp. 1-1

Author(s):

Seonwoo Min ◽

Seunghyun Park ◽

Siwon Kim ◽

Hyun-Soo Choi ◽

Byunghan Lee ◽

...

Keyword(s):

Protein Sequence ◽

Structural Information

Download Full-text

The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001840005x ◽

2018 ◽

Vol 16 (02) ◽

pp. 1840005 ◽

Cited By ~ 8

Author(s):

Dmitry Suplatov ◽

Yana Sharapova ◽

Daria Timonina ◽

Kirill Kopylov ◽

Vytas Švedas

Keyword(s):

Rational Design ◽

Visual Analysis ◽

Structural Information ◽

Protein Structures ◽

Web Server ◽

Physical Contact ◽

Protein Families ◽

Sequence Alignments ◽

Homologous Proteins ◽

Correlated Mutations

The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand’s binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.

Download Full-text

Epistatic contributions promote the unification of incompatible models of neutral molecular evolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1913071117 ◽

2020 ◽

Vol 117 (11) ◽

pp. 5873-5882 ◽

Cited By ~ 1

Author(s):

Jose Alberto de la Paz ◽

Charisse M. Nartey ◽

Monisha Yuvaraj ◽

Faruck Morcos

Keyword(s):

Structural Information ◽

Stokes Shift ◽

Neutral Evolution ◽

Emergent Properties ◽

Sequence Evolution ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments ◽

Analysis Methodology ◽

The Relationship

We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.

Download Full-text

VarMap: a web tool for mapping genomic coordinates to protein sequence and structure and retrieving protein structural annotations

Bioinformatics ◽

10.1093/bioinformatics/btz482 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4854-4856 ◽

Cited By ~ 8

Author(s):

James D Stephenson ◽

Roman A Laskowski ◽

Andrew Nightingale ◽

Matthew E Hurles ◽

Janet M Thornton

Keyword(s):

Protein Sequence ◽

Structural Information ◽

Protein Structures ◽

Supplementary Information ◽

Supplementary Data ◽

Web Tool ◽

Genomic Variants ◽

Structural Context ◽

Pathogenic Variants ◽

Transcript Evidence

Abstract Motivation Understanding the protein structural context and patterning on proteins of genomic variants can help to separate benign from pathogenic variants and reveal molecular consequences. However, mapping genomic coordinates to protein structures is non-trivial, complicated by alternative splicing and transcript evidence. Results Here we present VarMap, a web tool for mapping a list of chromosome coordinates to canonical UniProt sequences and associated protein 3D structures, including validation checks, and annotating them with structural information. Availability and implementation https://www.ebi.ac.uk/thornton-srv/databases/VarMap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Crystallization of an atypical short-chain dehydrogenase fromVibrio vulnificuslacking the conserved catalytic tetrad

Acta Crystallographica Section F Structural Biology and Crystallization Communications ◽

10.1107/s1744309112018672 ◽

2012 ◽

Vol 68 (7) ◽

pp. 771-774 ◽

Cited By ~ 3

Author(s):

Geraldine Buysschaert ◽

Kenneth Verstraete ◽

Savvas N. Savvides ◽

Bjorn Vergauwen

Keyword(s):

Protein Sequence ◽

Scientific Literature ◽

Vibrio Vulnificus ◽

Short Chain ◽

Reduced Form ◽

Molecular Replacement ◽

Structural Studies ◽

Sequence Alignments ◽

Crystal Forms ◽

Short Chain Dehydrogenase

Short-chain dehydrogenases/reductases (SDRs) are a rapidly expanding superfamily of enzymes that are found in all kingdoms of life. Hallmarked by a highly conserved Asn-Ser-Tyr-Lys catalytic tetrad, SDRs have a broad substrate spectrum and play diverse roles in key metabolic processes. Locus tag VVA1599 inVibrio vulnificusencodes a short-chain dehydrogenase (hereafter referred to as SDRvv) which lacks the signature catalytic tetrad of SDR members. Structure-based protein sequence alignments have suggested that SDRvv may harbour a unique binding site for its nicotinamide cofactor. To date, structural studies of SDRs with altered catalytic centres are underrepresented in the scientific literature, thus limiting understanding of their spectrum of substrate and cofactor preferences. Here, the expression, purification and crystallization of recombinant SDRvv are presented. Two well diffracting crystal forms could be obtained by cocrystallization in the presence of the reduced form of the phosphorylated nicotinamide cofactor NADPH. The collected data were of sufficient quality for successful structure determination by molecular replacement and subsequent refinement. This work sets the stage for deriving the identity of the natural substrate of SDRvv and the structure–function landscape of typical and atypical SDRs.

Download Full-text

A Basic Molecular Analysis of the Diabetic Antigen GAD by Homology Modelling. Principles of the Method and Understanding of Antigenicity and Binding Sites

Pteridines ◽

10.1515/pteridines.2007.18.1.79 ◽

2007 ◽

Vol 18 (1) ◽

pp. 79-94

Author(s):

Marco Wiltgen ◽

Gernot P. Tilz

Keyword(s):

Active Site ◽

Protein Sequence ◽

Structure Prediction ◽

Structural Information ◽

Homology Modelling ◽

Protein Structures ◽

Dopa Decarboxylase ◽

Gad 65 ◽

Unknown Structure ◽

Introductory Paper

Abstract Functional specificity of a protein is linked to its structure. A growing section of bioinformatics deals with the prediction and visualization of protein 3D structures. In homology modelling, a protein sequence with an unknown structure is aligned with sequences of known protein structures. By exploiting structural information from the known configurations, the new structure can be predicted. In this introductory paper, we will present the principles of homology modelling and demonstrate the method used, by determining the structure of the enzyme glutamic decarboxylase (GAD 65). This protein is an autoantigen involved in several human autoimmune diseases. We will illustrate the different steps in structure prediction of GAD 65 by use of two experimentally determined structures of pig kidney DOPA decarboxylase (one structure in complex with the inhibitor carbidopa) as templates. The resulting model of GAD 65 provides detailed information about the active site of the protein and selected epitopes. By analysis of the interactions between the DOPA decarboxylase with the inhibitor carbidopa, the residues of the GAD 65 active site can be identified via the sequence alignment between DOPA and GAD 65. The locations of known epitopes in the molecule are visualized in special representations giving insights into mechanisms of antigenicity. Hydrophobicity analysis gives first hints for the adherence ability of GAD 65 to the cell membrane. Homology modelling is at present one of the most efficient techniques to provide accurate structural models of proteins. It is expected that in few years, for every new determined protein sequence, at least one member with a known structure of the same protein family will be available, which will steadily increase the importance and applicability of homology modelling.

Download Full-text

PROMALS web server for accurate multiple protein sequence alignments

Nucleic Acids Research ◽

10.1093/nar/gkm227 ◽

2007 ◽

Vol 35 (Web Server) ◽

pp. W649-W652 ◽

Cited By ~ 46

Author(s):

J. Pei ◽

B.-H. Kim ◽

M. Tang ◽

N. V. Grishin

Keyword(s):

Protein Sequence ◽

Web Server ◽

Sequence Alignments ◽

Multiple Protein

Download Full-text

Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms

Bioinformatics ◽

10.1093/bioinformatics/btn097 ◽

2008 ◽

Vol 24 (9) ◽

pp. 1145-1153 ◽

Cited By ~ 3

Author(s):

A. Poleksic ◽

M. Fienup

Keyword(s):

Protein Sequence ◽

Sequence Alignments ◽

Sequence Profiles

Download Full-text

Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation

Bioinformatics ◽

10.1093/bioinformatics/9.6.745 ◽

1993 ◽

Vol 9 (6) ◽

pp. 745-756 ◽

Cited By ~ 94

Author(s):

Craig D. Livingstone ◽

Geoffrey J. Barton

Keyword(s):

Protein Sequence ◽

Hierarchical Analysis ◽

Sequence Alignments ◽

Residue Conservation

Download Full-text

Determination of reliable regions in protein sequence alignments

Protein Engineering Design and Selection ◽

10.1093/protein/3.7.565 ◽

1990 ◽

Vol 3 (7) ◽

pp. 565-569 ◽

Cited By ~ 60

Author(s):

Martin Vingron ◽

Patrick Argos

Keyword(s):

Protein Sequence ◽

Sequence Alignments

Download Full-text