Identification of important amino acid replacements in the 2013-2016 Ebola virus outbreak

Mapping Intimacies ◽

10.1101/075168 ◽

2016 ◽

Author(s):

Abayomi S Olabode ◽

Derek Gatherer ◽

Xiaowei Jiang ◽

David Matthews ◽

Julian A Hiscox ◽

...

Keyword(s):

Amino Acid ◽

Ebola Virus ◽

Sequence Data ◽

Purifying Selection ◽

Dominant Mode ◽

Synonymous Mutations ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Important Amino Acid ◽

B Lineage

The phylogenetic relationships of Zaire ebolavirus have been intensively analysed over the course of the 2013-2016 outbreak. However, there has been limited consideration of the functional impact of this variation. Here we describe an analysis of the available sequence data in the context of protein structure and phylogenetic history. Amino acid replacements are rare and predicted to have minor effects on protein stability. Synonymous mutations greatly outnumber nonsynonymous mutations, and most of the latter fall into unstructured intrinsically disordered regions, indicating that purifying selection is the dominant mode of selective pressure. However, one replacement, occurring early in the outbreak in Gueckedou in Guinea on 31st March 2014 (alanine to valine at position 82 in the GP protein), is close to the site where the virus binds to the host receptor NPC1 and is located in the phylogenetic tree at the origin of the major B lineage of the outbreak. The functional and evolutionary evidence indicates this A82V change likely has consequences for EBOV's host specificity and hence adaptation to humans.

Download Full-text

Functional alterations caused by mutations reflect evolutionary trends of SARS-CoV-2

Briefings in Bioinformatics ◽

10.1093/bib/bbab042 ◽

2021 ◽

Author(s):

Liang Cheng ◽

Xudong Han ◽

Zijun Zhu ◽

Changlu Qi ◽

Ping Wang ◽

...

Keyword(s):

Reference Genome ◽

Sequence Data ◽

Purifying Selection ◽

Virus Genome ◽

Receptor Binding Domain ◽

Evolutionary Trends ◽

Synonymous Mutations ◽

Almost All ◽

Virus Strains ◽

New Mutations

Abstract Since the first report of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in December 2019, the COVID-19 pandemic has spread rapidly worldwide. Due to the limited virus strains, few key mutations that would be very important with the evolutionary trends of virus genome were observed in early studies. Here, we downloaded 1809 sequence data of SARS-CoV-2 strains from GISAID before April 2020 to identify mutations and functional alterations caused by these mutations. Totally, we identified 1017 nonsynonymous and 512 synonymous mutations with alignment to reference genome NC_045512, none of which were observed in the receptor-binding domain (RBD) of the spike protein. On average, each of the strains could have about 1.75 new mutations each month. The current mutations may have few impacts on antibodies. Although it shows the purifying selection in whole-genome, ORF3a, ORF8 and ORF10 were under positive selection. Only 36 mutations occurred in 1% and more virus strains were further analyzed to reveal linkage disequilibrium (LD) variants and dominant mutations. As a result, we observed five dominant mutations involving three nonsynonymous mutations C28144T, C14408T and A23403G and two synonymous mutations T8782C, and C3037T. These five mutations occurred in almost all strains in April 2020. Besides, we also observed two potential dominant nonsynonymous mutations C1059T and G25563T, which occurred in most of the strains in April 2020. Further functional analysis shows that these mutations decreased protein stability largely, which could lead to a significant reduction of virus virulence. In addition, the A23403G mutation increases the spike-ACE2 interaction and finally leads to the enhancement of its infectivity. All of these proved that the evolution of SARS-CoV-2 is toward the enhancement of infectivity and reduction of virulence.

Download Full-text

Differential roles of two DDX17 isoforms in the formation of membraneless organelles

The Journal of Biochemistry ◽

10.1093/jb/mvaa023 ◽

2020 ◽

Vol 168 (1) ◽

pp. 33-40

Author(s):

Yuya Hirai ◽

Eisuke Domae ◽

Yoshihiro Yoshikawa ◽

Keizo Tomonaga

Keyword(s):

Amino Acid ◽

Enzymatic Activity ◽

Rna Helicase ◽

Intracellular Distribution ◽

Amino Acid Sequences ◽

Nucleolar Localization ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Dead Box ◽

Additional Amino Acid

Abstract The RNA helicase, DDX17 is a member of the DEAD-box protein family. DDX17 has two isoforms: p72 and p82. The p82 isoform has additional amino acid sequences called intrinsically disordered regions (IDRs), which are related to the formation of membraneless organelles (MLOs). Here, we reveal that p72 is mostly localized to the nucleoplasm, while p82 is localized to the nucleoplasm and nucleoli. Additionally, p82 exhibited slower intranuclear mobility than p72. Furthermore, the enzymatic mutants of both p72 and p82 accumulate into the stress granules. The enzymatic mutant of p82 abolishes nucleolar localization of p82. Our findings suggest the importance of IDRs and enzymatic activity of DEAD-box proteins in the intracellular distribution and formation of MLOs.

Download Full-text

Markov Models of Amino Acid Substitution to Study Proteins with Intrinsically Disordered Regions

PLoS ONE ◽

10.1371/journal.pone.0020488 ◽

2011 ◽

Vol 6 (5) ◽

pp. e20488 ◽

Cited By ~ 27

Author(s):

Adam M. Szalkowski ◽

Maria Anisimova

Keyword(s):

Amino Acid ◽

Amino Acid Substitution ◽

Markov Models ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Disordered Regions

Download Full-text

Proteome-wide signatures of function in highly diverged intrinsically disordered regions

10.1101/578716 ◽

2019 ◽

Author(s):

Taraneh Zarin ◽

Bob Strome ◽

Alex N Nguyen Ba ◽

Simon Alberti ◽

Julie D Forman-Kay ◽

...

Keyword(s):

Amino Acid ◽

Amino Acid Sequences ◽

Functional Annotations ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Molecular Features ◽

Primary Amino ◽

Function Relationship ◽

Disordered Regions

AbstractIntrinsically disordered regions make up a large part of the proteome, but the sequence-to-function relationship in these regions is poorly understood, in part because the primary amino acid sequences of these regions are poorly conserved in alignments. Here we use an evolutionary approach to detect molecular features that are preserved in the amino acid sequences of orthologous intrinsically disordered regions. We find that most disordered regions contain multiple molecular features that are preserved, and we define these as “evolutionary signatures” of disordered regions. We demonstrate that intrinsically disordered regions with similar evolutionary signatures can rescue functionin vivo,and that groups of intrinsically disordered regions with similar evolutionary signatures are strongly enriched for functional annotations and phenotypes. We propose that evolutionary signatures can be used to predict function for many disordered regions from their amino acid sequences.

Download Full-text

Learning protein constitutive motifs from sequence data

eLife ◽

10.7554/elife.39397 ◽

2019 ◽

Vol 8 ◽

Cited By ~ 30

Author(s):

Jérôme Tubiana ◽

Simona Cocco ◽

Rémi Monasson

Keyword(s):

Sequence Data ◽

Protein Sequences ◽

Sequence Information ◽

Ligand Specificity ◽

Protein Families ◽

Restricted Boltzmann Machines ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Model Protein ◽

Lattice Proteins

Statistical analysis of evolutionary-related protein sequences provides information about their structure, function, and history. We show that Restricted Boltzmann Machines (RBM), designed to learn complex high-dimensional data and their statistical features, can efficiently model protein families from sequence information. We here apply RBM to 20 protein families, and present detailed results for two short protein domains (Kunitz and WW), one long chaperone protein (Hsp70), and synthetic lattice proteins for benchmarking. The features inferred by the RBM are biologically interpretable: they are related to structure (residue-residue tertiary contacts, extended secondary motifs (α-helixes and β-sheets) and intrinsically disordered regions), to function (activity and ligand specificity), or to phylogenetic identity. In addition, we use RBM to design new protein sequences with putative properties by composing and 'turning up' or 'turning down' the different modes at will. Our work therefore shows that RBM are versatile and practical tools that can be used to unveil and exploit the genotype–phenotype relationship for protein families.

Download Full-text

POODLE: Tools Predicting Intrinsically Disordered Regions of Amino Acid Sequence

Methods in Molecular Biology - Protein Structure Prediction ◽

10.1007/978-1-4939-0366-5_10 ◽

2014 ◽

pp. 131-145 ◽

Cited By ~ 3

Author(s):

Kana Shimizu

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Disordered Regions

Download Full-text

Proteome-scale amino-acid resolution footprinting of protein-binding sites in the intrinsically disordered regions of the human proteome

10.1101/2021.04.13.439572 ◽

2021 ◽

Author(s):

Caroline Benz ◽

Muhammad Ali ◽

Izabella Krystkowiak ◽

Leandro Simonetti ◽

Ahmed Sayadi ◽

...

Keyword(s):

Amino Acid ◽

Protein Interactions ◽

Human Proteome ◽

Cell Physiology ◽

Protein Protein Interactions ◽

Linear Motif ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Wide Range ◽

Disordered Regions

Specific protein-protein interactions are central to all processes that underlie cell physiology. Numerous studies using a wide range of experimental approaches have identified tens of thousands of human protein-protein interactions. However, many interactions remain to be discovered, and low affinity, conditional and cell type-specific interactions are likely to be disproportionately under-represented. Moreover, for most known protein-protein interactions the binding regions remain uncharacterized. We previously developed proteomic peptide phage display (ProP-PD), a method for simultaneous proteome-scale identification of short linear motif (SLiM)-mediated interactions and footprinting of the binding region with amino acid resolution. Here, we describe the second-generation human disorderome (HD2), an optimized ProP-PD library that tiles all disordered regions of the human proteome and allows the screening of ~1,000,000 overlapping peptides in a single binding assay. We define guidelines for how to process, filter and rank the results and provide PepTools, a toolkit for annotation and analysis of identified hits. We uncovered 2,161 interaction pairs for 35 known SLiM-binding domains and confirmed a subset of 38 interactions by biophysical or cell-based assays. Finally, we show how the amino acid resolution binding site information can be used to pinpoint functionally important disease mutations and phosphorylation events in intrinsically disordered regions of the human proteome. The HD2 ProP-PD library paired with PepTools represents a powerful pipeline for unbiased proteome-wide discovery of SLiM-based interactions.

Download Full-text

shiny-pred: a server for the prediction of protein disordered regions

F1000Research ◽

10.12688/f1000research.17669.1 ◽

2019 ◽

Vol 8 ◽

pp. 230

Author(s):

Mauricio Oberti ◽

Iosif Vaisman

Keyword(s):

Web Application ◽

Intrinsically Disordered Proteins ◽

Sequence Data ◽

Disordered Proteins ◽

Dimensional Structure ◽

Protein Chain ◽

Neural Network Models ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Disordered Regions

Intrinsically disordered proteins or intrinsically disordered regions (IDR) are segments within a protein chain lacking a stable three-dimensional structure under normal physiological conditions. Accurate prediction of IDRs is challenging due to their genome wide occurrence and low ratio of disordered residues, making them a difficult target for traditional classification techniques. Existing computational methods mostly rely on sequence profiles to improve accuracy, which is time consuming and computationally expensive. The shiny-pred application is an ab initio sequence-only disorder predictor implemented in R/Shiny language. In order to make predictions, it uses convolutional neural network models, trained using PDB sequence data. It can be installed on any operating system on which R can be installed and run locally. A public version of the web application can be accessed at https://gmu-binf.shinyapps.io/shiny-pred

Download Full-text

Why do eukaryotic proteins contain more intrinsically disordered regions?

10.1101/270694 ◽

2018 ◽

Author(s):

Walter Basile ◽

Marco Salvatore ◽

Claudio Bassot ◽

Arne Elofsson

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Intrinsic Disorder ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Significant Difference ◽

Eukaryotic Proteins ◽

The Difference ◽

Almost All ◽

Disordered Regions

AbstractIntrinsic disorder is much more abundant in eukaryotic than in prokaryotic proteins. However, the reason behind this is unclear. It has been proposed that the disordered regions are functionally important for regulation in eukaryotes, but it has also been proposed that the difference is a result of lower selective pressure in eukaryotes. Almost all studies intrinsic disorder is predicted from the amino acid sequence of a protein. Therefore, there should exist an underlying difference in the amino acid distributions between eukaryotic and prokaryotic proteins causing the predicted difference in intrinsic disorder. To obtain a better understanding of why eukaryotic proteins contain more intrinsically disordered regions we compare proteins from complete eukaryotic and prokaryotic proteomes.Here, we show that the difference in intrinsic disorder origin from differences in the linker regions. Eukaryotic proteins have more extended linker regions and, in particular, the eukaryotic linker regions are more disordered. The average eukaryotic protein is about 500 residues long; it contains 250 residues in linker regions, of which 80 are disordered. In comparison, prokaryotic proteins are about 350 residues long and only have 100-110 residues in linker regions, and less than 10 of these are intrinsically disordered.Further, we show that there is no systematic increase in the frequency of disorder-promoting residues in eukaryotic linker regions. Instead, the difference in frequency of only three amino acids seems to lie behind the difference. The most significant difference is that eukaryotic linkers contain about 9% serine, while prokaryotic linkers have roughly 6.5%. Eukaryotic linkers also contain about 2% more proline and 2-3% fewer isoleucine residues. The reason why primarily these amino acids vary in frequency is not apparent, but it cannot be excluded that the difference is serine is related to the increased need for regulation through phosphorylation and that the proline difference is related to increase of eukaryotic specific repeats.

Download Full-text

Sequence determinants of in cell condensate assembly morphology, dynamics, and oligomerization as measured by number and brightness analysis

10.1101/2021.04.18.440340 ◽

2021 ◽

Author(s):

Ryan J Emenecker ◽

Alex S Holehouse ◽

Lucia Strader

Keyword(s):

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Material Properties ◽

Oligomeric State ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Brightness Analysis ◽

Oligomerization Domain

Background: Biomolecular condensates are non-stoichiometric assemblies that are characterized by their capacity to spatially concentrate biomolecules and play a key role in cellular organization. Proteins that drive the formation of biomolecular condensates frequently contain oligomerization domains and intrinsically disordered regions (IDRs), both of which can contribute multivalent interactions that drive higher-order assembly. Our understanding of the relative and temporal contribution of oligomerization domains and IDRs to the material properties of in vivo biomolecular condensates is limited. Similarly, the spatial and temporal dependence of protein oligomeric state inside condensates has been largely unexplored in vivo. Methods: In this study, we combined quantitative microscopy with number and brightness analysis to investigate the aging, material properties, and protein oligomeric state of biomolecular condensates in vivo. Our work is focused on condensates formed by AUXIN RESPONSE FACTOR 19 (ARF19), which is a transcription factor integral to the signaling pathway for the plant hormone auxin. ARF19 contains a large central glutamine-rich IDR and a C-terminal Phox Bem1 (PB1) oligomerization domain and forms cytoplasmic condensates. Results: Our results reveal that the IDR amino acid composition can influence the morphology and material properties of ARF19 condensates. In contrast the distribution of oligomeric species within condensates appears insensitive to the IDR composition. In addition, we identified a relationship between the abundance of higher- and lower-order oligomers within individual condensates and their apparent fluidity. Conclusions: IDR amino acid composition affects condensate morphology and material properties. In ARF condensates, altering the amino acid composition of the IDR did not greatly affect the oligomeric state of proteins within the condensate.

Download Full-text