Link Your Sites (LYS.py): Coupling your PAML codeml results and homologous protein structures in PyMOL

dms-view: Interactive visualization tool for deep mutational scanning data

10.1101/2020.05.14.096842 ◽

2020 ◽

Cited By ~ 2

Author(s):

Sarah K. Hilton ◽

John Huddleston ◽

Allison Black ◽

Khrystyna North ◽

Adam S. Dingens ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Input Data ◽

Protein Structures ◽

Interactive Visualization ◽

Data File ◽

Visualization Tool ◽

Additional Information ◽

Other Information ◽

Amino Acid Mutations

Summary and PurposeThe high-throughput technique of deep mutational scanning (DMS) has recently made it possible to experimentally measure the effects of all amino-acid mutations to a protein (Fowler and Fields 2014). Over the past five years, this technique has been used to study dozens of different proteins (Esposito et al. 2019) and answer a variety of research questions. For example, DMS has been used for protein engineering (Wrenbeck, Faber, and Whitehead 2017), understanding the human immune response to viruses (Lee et al. 2019), and interpreting human variation in a clinical setting (Starita et al. 2017; Gelman et al. 2019). Accompanying this proliferation of DMS studies has been the development of software tools (Bloom 2015; Rubin et al. 2017) and databases (Esposito et al. 2019) for data analysis and sharing. However, for many purposes it is important to also integrate and visualize the DMS data in the context of other information, such as the 3-D protein structure or natural sequence-variation data.Here we describe dms-view (https://dms-view.github.io/), a flexible, web-based, interactive visualization tool for DMS data. dms-view is written in JavaScript and D3, and links site-level and mutation-level DMS data to a 3-D protein structure. The user can interactively select sites of interest to examine the DMS measurements in the context of the protein structure. dms-view tracks the input data and user selections in the URL, making it possible to save specific views of interactively generated visualizations to share with collaborators or to support a published study. Importantly, dms-view takes a flexible input data file so users can easily visualize their own DMS data in the context of protein structures of their choosing, and also incorporate additional information such amino-acid frequencies in natural alignments.

Download Full-text

Link Your Sites (LYS) Scripts: Automated search of protein structures and mapping of sites under positive selection detected by PAML

10.1101/540229 ◽

2019 ◽

Author(s):

Lys Sanz Moreta ◽

Rute R. da Fonseca

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Positive Selection ◽

Protein Structures ◽

Comparative Genomic ◽

Functional Domain ◽

Homologous Proteins ◽

Functional Impact ◽

Codon Substitution ◽

The Impact

ABSTRACTThe visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and to understand its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML [1] are done in almost complete proteomes, generating large numbers of candidate proteins that require individual structural analyses. Here we present two python wrapper scripts as the package Link Your Sites (LYS). The first one i) mines the RCSB database [10] using the blast alignment tool to find the best matching homologous sequences, ii) fetches their domain positions by using Prosites [3,8,9], iii) parses the output of PAML extracting the positional information of fast-evolving sites and transform them into the coordinate system of the protein structure, iv) outputs a file per gene with the positions correlations to its homologous sequence. The second script uses the output of the first one to generate the protein’s graphical assessment. LYS can therefore generate figures to be used in publication highlighting the positively selected sites mapped on regions that are known to have functional relevance and/or be used to reduce the number of targets that will be further analyzed by providing a list of those for which structural information can be retrieved.MotivationAutomatizing the search for protein structures to assess the functional impact of sites found to be under positive selection by codeml, implemented in PAML [1]. Building publication-quality figures highlighting the sites on a protein structure model that are within and outside functional domains. reduces the workload associated with selecting proteins for which a functional assessment of the impact of mutations can be done using a protein structure. This is especially relevant when analyzing almost complete proteomes which is the case of large comparative genomic studies.SoftwareLYS scripts are executed in the command line. They automatically search for homologous proteins at the RSCB database [10], determine the functional domain locations and correlate the positions pointed by the M8 model [1], and output a data frame that can be used as the input by PyMOL [7] to generate a visualization of the results.AvailabilityLYS is easy to install and implement and they are available at https://github.com/LysSanzMoreta/LYSAutomaticSearch

Download Full-text

ADOPS - Automatic Detection Of Positively Selected Sites

Journal of Integrative Bioinformatics ◽

10.1515/jib-2012-200 ◽

2012 ◽

Vol 9 (3) ◽

pp. 18-32 ◽

Cited By ~ 12

Author(s):

David Reboiro-Jato ◽

Miguel Reboiro-Jato ◽

Florentino Fdez-Riverola ◽

Cristina P. Vieira ◽

Nuno A. Fonseca ◽

...

Keyword(s):

Amino Acid ◽

Maximum Likelihood ◽

Dna Sequences ◽

Sequence Data ◽

Automatic Detection ◽

Acid Sites ◽

Nucleotide Sequence Data ◽

Software Applications ◽

Codon Substitution ◽

Positively Selected Sites

Summary Maximum-likelihood methods based on models of codon substitution have been widely used to infer positively selected amino acid sites that are responsible for adaptive changes. Nevertheless, in order to use such an approach, software applications are required to align protein and DNA sequences, infer a phylogenetic tree and run the maximum-likelihood models. Therefore, a significant effort is made in order to prepare input files for the different software applications and in the analysis of the output of every analysis. In this paper we present the ADOPS (Automatic Detection Of Positively Selected Sites) software. It was developed with the goal of providing an automatic and flexible tool for detecting positively selected sites given a set of unaligned nucleotide sequence data. An example of the usefulness of such a pipeline is given by showing, under different conditions, positively selected amino acid sites in a set of 54 Coffea putative S-RNase sequences. ADOPS software is freely available and can be downloaded from http://sing.ei.uvigo.es/ADOPS.

Download Full-text

PREMONITION - Preprocessing motifs in protein structures for search acceleration

F1000Research ◽

10.12688/f1000research.5166.1 ◽

2014 ◽

Vol 3 ◽

pp. 217 ◽

Cited By ~ 3

Author(s):

Sandeep Chakraborty ◽

Basuthkar J. Rao ◽

Bjarni Asgeirsson ◽

Ravindra Venkatramani ◽

Abhaya M. Dandekar

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Active Site ◽

Active Sites ◽

Protein Structures ◽

3D Structure ◽

Search Space ◽

Computational Method ◽

Computational Time ◽

Active Site Residues

The remarkable diversity in biological systems is rooted in the ability of the twenty naturally occurring amino acids to perform multifarious catalytic functions by creating unique structural scaffolds known as the active site. Finding such structrual motifs within the protein structure is a key aspect of many computational methods. The algorithm for obtaining combinations of motifs of a certain length, although polynomial in complexity, runs in non-trivial computer time. Also, the search space expands considerably if stereochemically equivalent residues are allowed to replace an amino acid in the motif. In the present work, we propose a method to precompile all possible motifs comprising of a set (n=4 in this case) of predefined amino acid residues from a protein structure that occur within a specified distance (R) of each other (PREMONITION). PREMONITION rolls a sphere of radius R along the protein fold centered at the C atom of each residue, and all possible motifs are extracted within this sphere. The number of residues that can occur within a sphere centered around a residue is bounded by physical constraints, thus setting an upper limit on the processing times. After such a pre-compilation step, the computational time required for querying a protein structure with multiple motifs is considerably reduced. Previously, we had proposed a computational method to estimate the promiscuity of proteins with known active site residues and 3D structure using a database of known active sites in proteins (CSA) by querying each protein with the active site motif of every other residue. The runtimes for such a comparison is reduced from days to hours using the PREMONITION methodology.

Download Full-text

Intergenic ORFs as elementary structural modules of de novo gene birth and protein evolution

10.1101/2021.04.13.439703 ◽

2021 ◽

Author(s):

Chris Papadopoulos ◽

Isabelle Callebaut ◽

Jean-Christophe Gelly ◽

Isabelle Hatin ◽

Olivier Namy ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

De Novo ◽

Protein Structures ◽

Structural Diversity ◽

Building Blocks ◽

Amino Acid Sequences ◽

Novel Genes ◽

Noncoding Sequences ◽

De Novo Gene

The noncoding genome plays an important role in de novo gene birth and in the emergence of genetic novelty. Nevertheless, how noncoding sequences' properties could promote the birth of novel genes and shape the evolution and the structural diversity of proteins remains unclear. Therefore, by combining different bioinformatic approaches, we characterized the fold potential diversity of the amino acid sequences encoded by all intergenic ORFs (Open Reading Frames) of S. cerevisiae with the aim of (i) exploring whether the large structural diversity observed in proteomes is already present in noncoding sequences, and (ii) estimating the potential of the noncoding genome to produce novel protein bricks that can either give rise to novel genes or be integrated into pre-existing proteins, thus participating in protein structure diversity and evolution. We showed that amino acid sequences encoded by most yeast intergenic ORFs contain the elementary building blocks of protein structures. Moreover, they encompass the large structural diversity of canonical proteins with strikingly the majority predicted as foldable. Then, we investigated the early stages of de novo gene birth by identifying intergenic ORFs with a strong translation signal in ribosome profiling experiments and by reconstructing the ancestral sequences of 70 yeast de novo genes. This enabled us to highlight sequence and structural factors determining de novo gene emergence. Finally, we showed a strong correlation between the fold potential of de novo proteins and the one of their ancestral amino acid sequences, reflecting the relationship between the noncoding genome and the protein structure universe.

Download Full-text

Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2002660117 ◽

2020 ◽

Vol 117 (45) ◽

pp. 28201-28211

Author(s):

Sumaiya Iqbal ◽

Eduardo Pérez-Palma ◽

Jakob B. Jespersen ◽

Patrick May ◽

David Hoksza ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Molecular Mechanisms ◽

Amino Acid Level ◽

Protein Structures ◽

Point Mutations ◽

Independent Set ◽

Clinical Genetics ◽

Missense Variants ◽

The Impact

Interpretation of the colossal number of genetic variants identified from sequencing applications is one of the major bottlenecks in clinical genetics, with the inference of the effect of amino acid-substituting missense variations on protein structure and function being especially challenging. Here we characterize the three-dimensional (3D) amino acid positions affected in pathogenic and population variants from 1,330 disease-associated genes using over 14,000 experimentally solved human protein structures. By measuring the statistical burden of variations (i.e., point mutations) from all genes on 40 3D protein features, accounting for the structural, chemical, and functional context of the variations’ positions, we identify features that are generally associated with pathogenic and population missense variants. We then perform the same amino acid-level analysis individually for 24 protein functional classes, which reveals unique characteristics of the positions of the altered amino acids: We observe up to 46% divergence of the class-specific features from the general characteristics obtained by the analysis on all genes, which is consistent with the structural diversity of essential regions across different protein classes. We demonstrate that the function-specific 3D features of the variants match the readouts of mutagenesis experiments for BRCA1 and PTEN, and positively correlate with an independent set of clinically interpreted pathogenic and benign missense variants. Finally, we make our results available through a web server to foster accessibility and downstream research. Our findings represent a crucial step toward translational genetics, from highlighting the impact of mutations on protein structure to rationalizing the variants’ pathogenicity in terms of the perturbed molecular mechanisms.

Download Full-text

RPMS: Ramachandran plot for multiple structures

Journal of Applied Crystallography ◽

10.1107/s0021889807053708 ◽

2008 ◽

Vol 41 (1) ◽

pp. 219-221 ◽

Cited By ~ 5

Author(s):

K. Gopalakrishnan ◽

S. Saravanan ◽

R. Sarani ◽

K. Sekar

Keyword(s):

Amino Acid ◽

Protein Structures ◽

Data Bank ◽

Amino Acid Residues ◽

Ramachandran Plot ◽

Homologous Protein ◽

Internet Computing ◽

Multiple Structures ◽

Atomic Coordinates ◽

Local Machine

An interactive internet computing server,RPMS(Ramachandran plot for multiple structures) has been developed to visualize the Ramachandran angles of several highly homologous protein structures in a single plot. Options are provided for users to locate the amino acid residues in various regions of the plot. To perform the above, users need to enter the Protein Data Bank (PDB) identification codes. In addition, users can upload the atomic coordinates from the local machine. A Java graphics interface has been deployed and the server has been interfaced with a locally maintained PDB anonymous FTP server, which is updated weekly. The serverRPMScan be accessed through the Bioinformatics web server at http://cluster.physics.iisc.ernet.in/rpms/.

Download Full-text

Highly accurate sequence-based prediction of half-sphere exposures of amino acid residues in proteins

Bioinformatics ◽

10.1093/bioinformatics/btv665 ◽

2015 ◽

Vol 32 (6) ◽

pp. 843-849 ◽

Cited By ~ 51

Author(s):

Rhys Heffernan ◽

Abdollah Dehzangi ◽

James Lyons ◽

Kuldip Paliwal ◽

Alok Sharma ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Structure Prediction ◽

Protein Structures ◽

Correlation Coefficients ◽

Accessible Surface Area ◽

Solvent Accessible Surface Area ◽

Supplementary Information ◽

Amino Acid Residues ◽

Solvent Exposure

Abstract Motivation: Solvent exposure of amino acid residues of proteins plays an important role in understanding and predicting protein structure, function and interactions. Solvent exposure can be characterized by several measures including solvent accessible surface area (ASA), residue depth (RD) and contact numbers (CN). More recently, an orientation-dependent contact number called half-sphere exposure (HSE) was introduced by separating the contacts within upper and down half spheres defined according to the Cα-Cβ (HSEβ) vector or neighboring Cα-Cα vectors (HSEα). HSEα calculated from protein structures was found to better describe the solvent exposure over ASA, CN and RD in many applications. Thus, a sequence-based prediction is desirable, as most proteins do not have experimentally determined structures. To our best knowledge, there is no method to predict HSEα and only one method to predict HSEβ. Results: This study developed a novel method for predicting both HSEα and HSEβ (SPIDER-HSE) that achieved a consistent performance for 10-fold cross validation and two independent tests. The correlation coefficients between predicted and measured HSEβ (0.73 for upper sphere, 0.69 for down sphere and 0.76 for contact numbers) for the independent test set of 1199 proteins are significantly higher than existing methods. Moreover, predicted HSEα has a higher correlation coefficient (0.46) to the stability change by residue mutants than predicted HSEβ (0.37) and ASA (0.43). The results, together with its easy Cα-atom-based calculation, highlight the potential usefulness of predicted HSEα for protein structure prediction and refinement as well as function prediction. Availability and implementation: The method is available at http://sparks-lab.org. Contact: [email protected] or [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

MISCAST: MIssense variant to protein StruCture Analysis web SuiTe

Nucleic Acids Research ◽

10.1093/nar/gkaa361 ◽

2020 ◽

Vol 48 (W1) ◽

pp. W132-W139

Author(s):

Sumaiya Iqbal ◽

David Hoksza ◽

Eduardo Pérez-Palma ◽

Patrick May ◽

Jakob B Jespersen ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Structure Analysis ◽

Protein Structures ◽

Missense Variant ◽

Protein Structure Analysis ◽

Missense Variants ◽

Functional Features ◽

And Function ◽

A Site

Abstract Human genome sequencing efforts have greatly expanded, and a plethora of missense variants identified both in patients and in the general population is now publicly accessible. Interpretation of the molecular-level effect of missense variants, however, remains challenging and requires a particular investigation of amino acid substitutions in the context of protein structure and function. Answers to questions like ‘Is a variant perturbing a site involved in key macromolecular interactions and/or cellular signaling?’, or ‘Is a variant changing an amino acid located at the protein core or part of a cluster of known pathogenic mutations in 3D?’ are crucial. Motivated by these needs, we developed MISCAST (missense variant to protein structure analysis web suite; http://miscast.broadinstitute.org/). MISCAST is an interactive and user-friendly web server to visualize and analyze missense variants in protein sequence and structure space. Additionally, a comprehensive set of protein structural and functional features have been aggregated in MISCAST from multiple databases, and displayed on structures alongside the variants to provide users with the biological context of the variant location in an integrated platform. We further made the annotated data and protein structures readily downloadable from MISCAST to foster advanced offline analysis of missense variants by a wide biological community.

Download Full-text

CoRINs: A tool to compare residue interaction networks from homologous proteins and conformers

10.1101/2020.06.29.178541 ◽

2020 ◽

Author(s):

Felipe V. da Fonseca ◽

Romildo O. Souza Júnior ◽

Marília V. A. de Almeida ◽

Thiago D. Soares ◽

Diego A. A. Morais ◽

...

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Conformational Changes ◽

Protein Function ◽

Protein Structures ◽

Software Tool ◽

Interaction Networks ◽

Homologous Proteins ◽

Residue Interaction ◽

And Function

ABSTRACTMotivationA useful approach to evaluate protein structure and quickly visualize crucial physicochemical interactions related to protein function is to construct Residue Interactions Networks (RINs). By using this application of graphs theory, the amino acid residues constitute the nodes, and the edges represent their interactions with other structural elements. Although several tools that construct RINs are available, many of them do not compare RINs from distinct protein structures. This comparison can give valuable insights into the understanding of conformational changes and the effects of amino acid substitutions in protein structure and function. With that in mind, we present CoRINs (Comparator of Residue Interaction Networks), a software tool that extensively compares RINs. The program has an accessible and user-friendly web interface, which summarizes the differences in several network parameters using interactive plots and tables. As a usage example of CoRINs, we compared RINs from conformers of two cancer-associated proteins.AvailabilityThe program is available at https://github.com/LasisUFRN/CoRINs.

Download Full-text