scholarly journals Link Your Sites (LYS.py): Coupling your PAML codeml results and homologous protein structures in PyMOL

2018 ◽  
Author(s):  
Lys Sanz Moreta ◽  
Rute Andreia Rodrigues da Fonseca

ABSTRACTThe visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and to understand its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML (Z. Yang, 2000) are done in almost complete proteomes, generating large numbers of candidate proteins that require individual structural analyses. Here I present a python wrapper script that integrates the output of PAML with the PyMOL visualization tool to automate the generation of protein structure models where positively selected sites are mapped along with the location of putative functional domains.

Author(s):  
Sarah K. Hilton ◽  
John Huddleston ◽  
Allison Black ◽  
Khrystyna North ◽  
Adam S. Dingens ◽  
...  

Summary and PurposeThe high-throughput technique of deep mutational scanning (DMS) has recently made it possible to experimentally measure the effects of all amino-acid mutations to a protein (Fowler and Fields 2014). Over the past five years, this technique has been used to study dozens of different proteins (Esposito et al. 2019) and answer a variety of research questions. For example, DMS has been used for protein engineering (Wrenbeck, Faber, and Whitehead 2017), understanding the human immune response to viruses (Lee et al. 2019), and interpreting human variation in a clinical setting (Starita et al. 2017; Gelman et al. 2019). Accompanying this proliferation of DMS studies has been the development of software tools (Bloom 2015; Rubin et al. 2017) and databases (Esposito et al. 2019) for data analysis and sharing. However, for many purposes it is important to also integrate and visualize the DMS data in the context of other information, such as the 3-D protein structure or natural sequence-variation data.Here we describe dms-view (https://dms-view.github.io/), a flexible, web-based, interactive visualization tool for DMS data. dms-view is written in JavaScript and D3, and links site-level and mutation-level DMS data to a 3-D protein structure. The user can interactively select sites of interest to examine the DMS measurements in the context of the protein structure. dms-view tracks the input data and user selections in the URL, making it possible to save specific views of interactively generated visualizations to share with collaborators or to support a published study. Importantly, dms-view takes a flexible input data file so users can easily visualize their own DMS data in the context of protein structures of their choosing, and also incorporate additional information such amino-acid frequencies in natural alignments.


2019 ◽  
Author(s):  
Lys Sanz Moreta ◽  
Rute R. da Fonseca

ABSTRACTThe visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and to understand its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML [1] are done in almost complete proteomes, generating large numbers of candidate proteins that require individual structural analyses. Here we present two python wrapper scripts as the package Link Your Sites (LYS). The first one i) mines the RCSB database [10] using the blast alignment tool to find the best matching homologous sequences, ii) fetches their domain positions by using Prosites [3,8,9], iii) parses the output of PAML extracting the positional information of fast-evolving sites and transform them into the coordinate system of the protein structure, iv) outputs a file per gene with the positions correlations to its homologous sequence. The second script uses the output of the first one to generate the protein’s graphical assessment. LYS can therefore generate figures to be used in publication highlighting the positively selected sites mapped on regions that are known to have functional relevance and/or be used to reduce the number of targets that will be further analyzed by providing a list of those for which structural information can be retrieved.MotivationAutomatizing the search for protein structures to assess the functional impact of sites found to be under positive selection by codeml, implemented in PAML [1]. Building publication-quality figures highlighting the sites on a protein structure model that are within and outside functional domains. reduces the workload associated with selecting proteins for which a functional assessment of the impact of mutations can be done using a protein structure. This is especially relevant when analyzing almost complete proteomes which is the case of large comparative genomic studies.SoftwareLYS scripts are executed in the command line. They automatically search for homologous proteins at the RSCB database [10], determine the functional domain locations and correlate the positions pointed by the M8 model [1], and output a data frame that can be used as the input by PyMOL [7] to generate a visualization of the results.AvailabilityLYS is easy to install and implement and they are available at https://github.com/LysSanzMoreta/LYSAutomaticSearch


2012 ◽  
Vol 9 (3) ◽  
pp. 18-32 ◽  
Author(s):  
David Reboiro-Jato ◽  
Miguel Reboiro-Jato ◽  
Florentino Fdez-Riverola ◽  
Cristina P. Vieira ◽  
Nuno A. Fonseca ◽  
...  

Summary Maximum-likelihood methods based on models of codon substitution have been widely used to infer positively selected amino acid sites that are responsible for adaptive changes. Nevertheless, in order to use such an approach, software applications are required to align protein and DNA sequences, infer a phylogenetic tree and run the maximum-likelihood models. Therefore, a significant effort is made in order to prepare input files for the different software applications and in the analysis of the output of every analysis. In this paper we present the ADOPS (Automatic Detection Of Positively Selected Sites) software. It was developed with the goal of providing an automatic and flexible tool for detecting positively selected sites given a set of unaligned nucleotide sequence data. An example of the usefulness of such a pipeline is given by showing, under different conditions, positively selected amino acid sites in a set of 54 Coffea putative S-RNase sequences. ADOPS software is freely available and can be downloaded from http://sing.ei.uvigo.es/ADOPS.


F1000Research ◽  
2014 ◽  
Vol 3 ◽  
pp. 217 ◽  
Author(s):  
Sandeep Chakraborty ◽  
Basuthkar J. Rao ◽  
Bjarni Asgeirsson ◽  
Ravindra Venkatramani ◽  
Abhaya M. Dandekar

The remarkable diversity in biological systems is rooted in the ability of the twenty naturally occurring amino acids to perform multifarious catalytic functions by creating unique structural scaffolds known as the active site. Finding such structrual motifs within the protein structure is a key aspect of many computational methods. The algorithm for obtaining combinations of motifs of a certain length, although polynomial in complexity, runs in non-trivial computer time. Also, the search space expands considerably if stereochemically equivalent residues are allowed to replace an amino acid in the motif. In the present work, we propose a method to precompile all possible motifs comprising of a set (n=4 in this case) of predefined amino acid residues from a protein structure that occur within a specified distance (R) of each other (PREMONITION). PREMONITION rolls a sphere of radius R along the protein fold centered at the C atom of each residue, and all possible motifs are extracted within this sphere. The number of residues that can occur within a sphere centered around a residue is bounded by physical constraints, thus setting an upper limit on the processing times. After such a pre-compilation step, the computational time required for querying a protein structure with multiple motifs is considerably reduced. Previously, we had proposed a computational method to estimate the promiscuity of proteins with known active site residues and 3D structure using a database of known active sites in proteins (CSA) by querying each protein with the active site motif of every other residue. The runtimes for such a comparison is reduced from days to hours using the PREMONITION methodology.


2021 ◽  
Author(s):  
Chris Papadopoulos ◽  
Isabelle Callebaut ◽  
Jean-Christophe Gelly ◽  
Isabelle Hatin ◽  
Olivier Namy ◽  
...  

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.


2020 ◽  
Vol 117 (45) ◽  
pp. 28201-28211
Author(s):  
Sumaiya Iqbal ◽  
Eduardo Pérez-Palma ◽  
Jakob B. Jespersen ◽  
Patrick May ◽  
David Hoksza ◽  
...  

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms.


2008 ◽  
Vol 41 (1) ◽  
pp. 219-221 ◽  
Author(s):  
K. Gopalakrishnan ◽  
S. Saravanan ◽  
R. Sarani ◽  
K. Sekar

An interactive internet computing server,RPMS(Ramachandran plot for multiple structures) has been developed to visualize the Ramachandran angles of several highly homologous protein structures in a single plot. Options are provided for users to locate the amino acid residues in various regions of the plot. To perform the above, users need to enter the Protein Data Bank (PDB) identification codes. In addition, users can upload the atomic coordinates from the local machine. A Java graphics interface has been deployed and the server has been interfaced with a locally maintained PDB anonymous FTP server, which is updated weekly. The serverRPMScan be accessed through the Bioinformatics web server at http://cluster.physics.iisc.ernet.in/rpms/.


2015 ◽  
Vol 32 (6) ◽  
pp. 843-849 ◽  
Author(s):  
Rhys Heffernan ◽  
Abdollah Dehzangi ◽  
James Lyons ◽  
Kuldip Paliwal ◽  
Alok Sharma ◽  
...  

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 48 (W1) ◽  
pp. W132-W139
Author(s):  
Sumaiya Iqbal ◽  
David Hoksza ◽  
Eduardo Pérez-Palma ◽  
Patrick May ◽  
Jakob B Jespersen ◽  
...  

Abstract Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.


2020 ◽  
Author(s):  
Felipe V. da Fonseca ◽  
Romildo O. Souza Júnior ◽  
Marília V. A. de Almeida ◽  
Thiago D. Soares ◽  
Diego A. A. Morais ◽  
...  

ABSTRACTMotivationA useful approach to evaluate protein structure and quickly visualize crucial physicochemical interactions related to protein function is to construct Residue Interactions Networks (RINs). By using this application of graphs theory, the amino acid residues constitute the nodes, and the edges represent their interactions with other structural elements. Although several tools that construct RINs are available, many of them do not compare RINs from distinct protein structures. This comparison can give valuable insights into the understanding of conformational changes and the effects of amino acid substitutions in protein structure and function. With that in mind, we present CoRINs (Comparator of Residue Interaction Networks), a software tool that extensively compares RINs. The program has an accessible and user-friendly web interface, which summarizes the differences in several network parameters using interactive plots and tables. As a usage example of CoRINs, we compared RINs from conformers of two cancer-associated proteins.AvailabilityThe program is available at https://github.com/LasisUFRN/CoRINs.


Sign in / Sign up

Export Citation Format

Share Document