Revealing evolutionary constraints on proteins through sequence analysis

Mapping Intimacies ◽

10.1101/397521 ◽

2018 ◽

Author(s):

Shou-Wen Wang ◽

Anne-Florence Bitbol ◽

Ned S. Wingreen

Keyword(s):

Amino Acids ◽

Covariance Matrix ◽

Sequence Data ◽

Amino Acid Sequences ◽

Elastic Network Model ◽

Sequence Alignments ◽

Cellular Processes ◽

Large Numbers ◽

Protein Properties ◽

Selected Traits

AbstractStatistical analysis of alignments of large numbers of protein sequences has revealed “sectors” of collectively coevolving amino acids in several protein families. Here, we show that selection acting on any functional property of a protein, represented by an additive trait, can give rise to such a sector. As an illustration of a selected trait, we consider the elastic energy of an important conformational change within an elastic network model, and we show that selection acting on this energy leads to correlations among residues. For this concrete example and more generally, we demonstrate that the main signature of functional sectors lies in the small-eigenvalue modes of the covariance matrix of the selected sequences. However, secondary signatures of these functional sectors also exist in the extensively-studied large-eigenvalue modes. Our simple, general model leads us to propose a principled method to identify functional sectors, along with the magnitudes of mutational effects, from sequence data. We further demonstrate the robustness of these functional sectors to various forms of selection, and the robustness of our approach to the identification of multiple selected traits.Author summaryProteins play crucial parts in all cellular processes, and their functions are encoded in their amino-acid sequences. Recently, statistical analyses of protein sequence alignments have demonstrated the existence of “sectors” of collectively correlated amino acids. What is the origin of these sectors? Here, we propose a simple underlying origin of protein sectors: they can arise from selection acting on any collective protein property. We find that the main signature of these functional sectors lies in the low-eigenvalue modes of the covariance matrix of the selected sequences. A better understanding of protein sectors will make it possible to discern collective protein properties directly from sequences, as well as to design new functional sequences, with far-reaching applications in synthetic biology.

Download Full-text

A minimum reporting standard for multiple sequence alignments

10.1101/2020.01.15.907733 ◽

2020 ◽

Author(s):

Thomas KF Wong ◽

Subha Kalyaanamoorthy ◽

Karen Meusemann ◽

David K Yeates ◽

Bernhard Misof ◽

...

Keyword(s):

Amino Acids ◽

Sequence Data ◽

Pivotal Role ◽

Sequence Alignments ◽

Reporting Standard ◽

Multiple Sequence ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Multiple Sequence Alignments

ABSTRACTMultiple sequence alignments (MSAs) play a pivotal role in studies of molecular sequence data, but nobody has developed a minimum reporting standard (MRS) to quantify the completeness of MSAs in terms of completely-specified nucleotides or amino acids. We present an MRS that relies on four simple completeness metrics. The metrics are implemented in AliStat, a program developed to support the MRS. A survey of published MSAs illustrates the benefits and unprecedented transparency offered by the MRS.

Download Full-text

A minimum reporting standard for multiple sequence alignments

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa024 ◽

2020 ◽

Vol 2 (2) ◽

Cited By ~ 8

Author(s):

Thomas K F Wong ◽

Subha Kalyaanamoorthy ◽

Karen Meusemann ◽

David K Yeates ◽

Bernhard Misof ◽

...

Keyword(s):

Amino Acids ◽

Sequence Data ◽

Pivotal Role ◽

Sequence Alignments ◽

Reporting Standard ◽

Multiple Sequence ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Multiple Sequence Alignments

Abstract Multiple sequence alignments (MSAs) play a pivotal role in studies of molecular sequence data, but nobody has developed a minimum reporting standard (MRS) to quantify the completeness of MSAs in terms of completely specified nucleotides or amino acids. We present an MRS that relies on four simple completeness metrics. The metrics are implemented in AliStat, a program developed to support the MRS. A survey of published MSAs illustrates the benefits and unprecedented transparency offered by the MRS.

Download Full-text

The RNA Binding Domain of the Hantaan Virus N Protein Maps to a Central, Conserved Region

Journal of Virology ◽

10.1128/jvi.76.7.3301-3308.2002 ◽

2002 ◽

Vol 76 (7) ◽

pp. 3301-3308 ◽

Cited By ~ 48

Author(s):

Xiaolin Xu ◽

William Severson ◽

Noah Villegas ◽

Connie S. Schmaljohn ◽

Colleen B. Jonsson

Keyword(s):

Amino Acids ◽

Rna Binding ◽

Amino Acid Sequences ◽

Hantaan Virus ◽

Mobility Shift ◽

Sequence Alignments ◽

Competition Analysis ◽

N Protein ◽

Protein Amino Acids ◽

Electrophoretic Mobility Shift Assays

ABSTRACT The nucleocapsid (N) protein of hantaviruses encapsidates both viral genomic and antigenomic RNAs, although only the genomic viral RNA (vRNA) is packaged into virions. To define the domain within the Hantaan virus (HTNV) N protein that mediates these interactions, 14 N- and C-terminal deletion constructs were cloned into a bacterial expression vector, expressed, and purified to homogeneity. Each protein was examined for its ability to bind the HTNV S segment vRNA with filter binding and gel electrophoretic mobility shift assays. These studies mapped a minimal region within the HTNV N protein (amino acids 175 to 217) that bound vRNA. Sequence alignments made from several hantavirus N protein sequences showed that the region identified has a 58% identity and an 86% similarity among these amino acid sequences. Two peptides corresponding to amino acids 175 to 196 (N1) and 197 to 218 (N2) were synthesized. The RNA binding of each peptide was measured by filter binding and competition analysis. Three oligoribonucleotides were used to measure binding affinity and assess specificity. The N2 peptide contained the major RNA binding determinants, while the N1 peptide, when mixed with N2, contributed to the specificity of vRNA recognition.

Download Full-text

Anticoagulant serine fibrinogenases from Vipera lebetinavenom: structure-function relationships

Thrombosis and Haemostasis ◽

10.1055/s-0037-1613468 ◽

2003 ◽

Vol 89 (05) ◽

pp. 826-831 ◽

Cited By ~ 15

Author(s):

Anu Aaspõllu ◽

Jüri Siigur ◽

Ene Siigur

Keyword(s):

Amino Acids ◽

Snake Venom ◽

Sequence Data ◽

Research Society ◽

Catalytic Triad ◽

Amino Acid Sequences ◽

Serine Proteinases ◽

Significant Similarity ◽

Cdna Sequences ◽

Vipera Lebetina

SummaryAmino acid sequences of two anticoagulant serine fibrinogenases – α- and β-fibrinogenase (VLAF and VLBF) from Vipera lebetina venom have been deduced from the cDNA sequences encoding the enzymes. The mature protein sequences of 234 amino acids (VLAF) and 233 amino acids (VLBF) exhibit significant similarity with other snake venom serine proteinases. Both enzymes contain the catalytic triad His57, Asp102, Ser195, and twelve conserved cysteines forming six disulfide bridges. Unlike typical trypsin-like serine proteinases, they lack the third aspartate, Asp189 which is replaced by Gly189. VLBF is a typical representative of arginine esterases – β-fibrinogenases. α-Fibrinogenase, VLAF, is unique among snake venom serine proteinases with homologous structure. Until now there is no evidence of the anticoagulant serine enzymes degrading fibrinogen α-chain only and lacking esterolytic activity.Parts of this paper were presented at the 17th International Fibrinogen Workshop of the International Fibrinogen Research Society (IFRS) held in Munich, Germany, September, 2002.The sequence data of Vipera lebetina mRNA for α- and β-fibrinogenase have been deposited in the GenBank database under accession numbers AF528193 (VLAF) and AF536235 (VLBF).

Download Full-text

Application of the Ramanujan Fourier Transform for the Analysis of Secondary Structure Content in Amino Acid Sequences

Methods of Information in Medicine ◽

10.1055/s-0038-1625380 ◽

2007 ◽

Vol 46 (02) ◽

pp. 126-129 ◽

Cited By ~ 14

Author(s):

L. Pattini ◽

S. Cerutti ◽

L. Mainardi

Keyword(s):

Fourier Transform ◽

Sequence Data ◽

Amino Acid Sequences ◽

Numerical Series ◽

Alpha Helices ◽

Protein Properties ◽

Protein Sequence Data ◽

Group A ◽

Group B ◽

Finite Integer

Summary Objective : A novel method is presented for the investigation of protein properties of sequences using Ramanujan Fourier Transform (RFT). Methods : The new methodology involves the preprocessing of protein sequence data by numerically encoding it and then applying the RFT. The RFT is based on projecting the obtained numerical series on a set of basis functions constituted by Ramanujan sums (RS). In RS components, periodicities of finite integer length, rather than frequency, (as in classical harmonic analysis) are considered. Results : The potential of the new approach is documented by a few examples in the analysis of hydrophobic profiles of proteins in two classes including abundance of alpha-helices (group A) or beta-strands (group B). Different patterns are provided as evidence. Conclusions : RFT can be used to characterize the structural properties of proteins and integrate complementary information provided by other signal processing transforms.

Download Full-text

Evolutionary relationships among the serpins

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.1993.0141 ◽

1993 ◽

Vol 342 (1300) ◽

pp. 101-119 ◽

Cited By ~ 48

Keyword(s):

Amino Acids ◽

Proteinase Inhibitors ◽

Gene Tree ◽

Three Dimensional ◽

Serine Proteinase ◽

Amino Acid Sequences ◽

Evolutionary Relationships ◽

Evolutionary Divergence ◽

Structural Elements ◽

Sequence Alignments

The serpins are a widely distributed group of serine proteinase inhibitors found in plants, birds, mammals and viruses. Despite the great evolutionary divergence of these organisms, their serpins art highly conserved, both in sequence and structurally. Amino acid sequences were aligned by a combination of automatic algorithms and by consideration of conserved structural elements in those serpins for which crystal structures exist. The program HOMED was used which allowed the alignment of amino acids to be simultaneously converted into the equivalently aligned nucleotide sequences. The aligned amino acids were used as the basis for superposition of the four known three-dimensional structures for which coordinates are available and compared with an optimal three-dimensional superposition in order to estimate the reliability of the sequence alignment. Phylogenetic relationships implied by these nucleotide sequence alignments were determined by the method of maximum parsimony. The proposed gene tree suggested that as much diversity existed between the plant serpin and mammalian serpins as was present among mammalian serpins and provided further evidence that the architecture of serpin molecules is highly constrained.

Download Full-text

The V-region sequence of the H chain from a third rabbit anti-pneumococcal antibody

Biochemical Journal ◽

10.1042/bj1570449 ◽

1976 ◽

Vol 157 (2) ◽

pp. 449-459 ◽

Cited By ~ 5

Author(s):

J C Jaton

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Sequence Data ◽

Hypervariable Region ◽

Variable Region ◽

Amino Acid Sequences ◽

Rabbit Antibody ◽

Simple Correlation ◽

Type Iii ◽

V Region

The amino acid sequence of the V (variable) region of the heavy (H) chain of rabbit antibody BS-1, raised against type III pneumococcal vaccine, is reported. Together with the sequence data of the V region of the light (L) chain previously determined [Jaton (1974a) Biochem. J. 141, 1-13], the present work completes the analysis of the V domain of the homogeneous antibody BS-1. The V domains (VL + VH regions) of this antibody are compared with those of two other anti-(type III) pneumococcal antibodies BS-5 and K-25 [Jaton (1975) Biochem. J. 147, 235-247]. Except for the second hypervariable section of the L chains, these antibodies have very different sequences in the hypervariable segments of the V domains. Within the third hypervariable region of the H chain, each antibody has a different length: BS-1 is three amino acids shorter than K-25 and two amino acids shorter than BS-5. When the sequences in that section are aligned for maximal homology, only two residues, glycine-97 and leucine-101, are common to the three antibodies. On the basis of the amino acid sequences of these three anti-pneumococcal antibodies, the results do not support the concept of a simple correlation between primary structure in the hypervariable sections (known to determine the shape of the combining site) and antigen-binding specificity.

Download Full-text

Location of the carbohydrate groups of ovomucoid

Biochemical Journal ◽

10.1042/bj1590335 ◽

1976 ◽

Vol 159 (2) ◽

pp. 335-345 ◽

Cited By ~ 38

Author(s):

J G Beeley

Keyword(s):

Amino Acids ◽

Sialic Acid ◽

High Frequency ◽

Sequence Data ◽

Peptide Conformation ◽

Amino Acid Sequences ◽

Primary Sequence ◽

Attachment Sites ◽

The Press ◽

Partial Gene Duplication

Tryptic glycopeptides were purified from the sialic acid-free variant of ovomucoid, O1, and its CNBr fragments. The amino acid sequences adjacent to the four major sites of carbohydrate (Carb.) attachment were: (1), Phe-Pro-Asn(Carb.)-Ala-Thr-Asp-Lys-Glu-Gly-Lys; (2), Ala-Try-Ser-Ile-Glu-Phe-Gly-Thr-Asn (Carb.)-Ile-Ser-Lys; (3), Glu, Thr-Val-Pro-Met-Asn(Carb.)-cys-Ser; (4), Ser-Ser-Tyr-Ala-Asn (Carb.)-Thr-Thr-Ser-Glu-Asp-Gly-Lys, Glycosylated Asn residues were located at position 10, between residues 49 and 60, and at positions 69 and 75, in the primary sequence. All of these carbohydrate groups contained GlcNAc, Man and Gal in the approximate molar proprotions 5:3:0.5. A further glycopeptide containing His was isolated in low yield, suggesting that some carbohydrate is attached at a fifth site. Two of the carbohydrate-attachment sites (Asn-10 and Asn-75) occur in sequences that show internal homologies. These are presumed to have evolved as a consequence of partial gene duplication. Three of the carbohydrate-attachment sites occur in similar positions to the carbohydrate groups in quail ovomucoid [Laskowski (1976) Protides Biol. Fluids Proc. Colloq. 23, in the press]. Prediction of peptide conformation from the sequence data by the method of Chou & Fasman [(1974) Biochemistry 13, 222-225] indicated that four glycosylated Asn residues in hen ovomucoid are very close to groups of amino acids that occur with high frequency in β-turns. The possible significance of peptide-chain conformation in the attachment of carbohydrate to glycoproteins is briefly discussed.

Download Full-text

AC2: An Efficient Protein Sequence Compression Tool Using Artificial Neural Networks and Cache-Hash Models

Entropy ◽

10.3390/e23050530 ◽

2021 ◽

Vol 23 (5) ◽

pp. 530

Author(s):

Milton Silva ◽

Diogo Pratas ◽

Armando J. Pinho

Keyword(s):

Protein Sequence ◽

Sequence Data ◽

Specific Protein ◽

General Purpose ◽

Amino Acid Sequences ◽

Input Size ◽

Protein Sequence Data ◽

Analysis Application ◽

Straightforward Solution ◽

Human Coronaviruses

Recently, the scientific community has witnessed a substantial increase in the generation of protein sequence data, triggering emergent challenges of increasing importance, namely efficient storage and improved data analysis. For both applications, data compression is a straightforward solution. However, in the literature, the number of specific protein sequence compressors is relatively low. Moreover, these specialized compressors marginally improve the compression ratio over the best general-purpose compressors. In this paper, we present AC2, a new lossless data compressor for protein (or amino acid) sequences. AC2 uses a neural network to mix experts with a stacked generalization approach and individual cache-hash memory models to the highest-context orders. Compared to the previous compressor (AC), we show gains of 2–9% and 6–7% in reference-free and reference-based modes, respectively. These gains come at the cost of three times slower computations. AC2 also improves memory usage against AC, with requirements about seven times lower, without being affected by the sequences’ input size. As an analysis application, we use AC2 to measure the similarity between each SARS-CoV-2 protein sequence with each viral protein sequence from the whole UniProt database. The results consistently show higher similarity to the pangolin coronavirus, followed by the bat and human coronaviruses, contributing with critical results to a current controversial subject. AC2 is available for free download under GPLv3 license.

Download Full-text

Roles of the C-Terminal Amino Acids of Non-Hexameric Helicases: Insights from Escherichia coli UvrD

International Journal of Molecular Sciences ◽

10.3390/ijms22031018 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1018

Author(s):

Hiroaki Yokota

Keyword(s):

Escherichia Coli ◽

Amino Acids ◽

Amino Acid ◽

Single Molecule ◽

Underlying Mechanism ◽

Amino Acid Sequences ◽

E Coli ◽

X Ray Crystallography ◽

Terminal Amino

Helicases are nucleic acid-unwinding enzymes that are involved in the maintenance of genome integrity. Several parts of the amino acid sequences of helicases are very similar, and these quite well-conserved amino acid sequences are termed “helicase motifs”. Previous studies by X-ray crystallography and single-molecule measurements have suggested a common underlying mechanism for their function. These studies indicate the role of the helicase motifs in unwinding nucleic acids. In contrast, the sequence and length of the C-terminal amino acids of helicases are highly variable. In this paper, I review past and recent studies that proposed helicase mechanisms and studies that investigated the roles of the C-terminal amino acids on helicase and dimerization activities, primarily on the non-hexermeric Escherichia coli (E. coli) UvrD helicase. Then, I center on my recent study of single-molecule direct visualization of a UvrD mutant lacking the C-terminal 40 amino acids (UvrDΔ40C) used in studies proposing the monomer helicase model. The study demonstrated that multiple UvrDΔ40C molecules jointly participated in DNA unwinding, presumably by forming an oligomer. Thus, the single-molecule observation addressed how the C-terminal amino acids affect the number of helicases bound to DNA, oligomerization, and unwinding activity, which can be applied to other helicases.

Download Full-text