SeqStruct: A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Contacts Yields Sequence-Structure Congruence

Mapping Intimacies ◽

10.1101/268904 ◽

2018 ◽

Author(s):

Kejue Jia ◽

Robert L. Jernigan

Keyword(s):

Amino Acid ◽

Protein Sequence ◽

Sequence Similarity ◽

Protein Structures ◽

Substitution Matrix ◽

Similarity Matrix ◽

Sequence Matching ◽

Sequence Structure ◽

Amino Acid Similarity ◽

Simple Amino Acid

SUMMARYProtein sequence matching does not properly account for some well-known features of protein structures: surface residues being more variable than core residues, the high packing densities in globular proteins, and does not yield good matches of sequences of many proteins known to be close structural relatives. There are now abundant protein sequences and structures to enable major improvements to sequence matching. Here, we utilize structural frameworks to mount the observed correlated sequences to identify the most important correlated parts. The rationale is that protein structures provide the important physical framework for improving sequence matching. Combining the sequence and structure data in this way leads to a simple amino acid substitution matrix that can be readily incorporated into any sequence matching. This enables the incorporation of allosteric information into sequence matching and transforms it effectively from a 1-D to a 3-D procedure. The results from testing in over 3,000 sequence matches demonstrate a 37% gain in sequence similarity and a loss of 26% of the gaps when compared with the use of BLOSUM62. And, importantly there are major gains in the specificity of sequence matching across diverse proteins. Specifically, all known cases where protein structures match but sequences do not match well are resolved.

Download Full-text

A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Features Yields Complete Sequence-Structure Congruence

Biophysical Journal ◽

10.1016/j.bpj.2018.11.289 ◽

2019 ◽

Vol 116 (3) ◽

pp. 46a

Author(s):

Kejue Jia

Keyword(s):

Amino Acid ◽

Structural Features ◽

Complete Sequence ◽

Similarity Matrix ◽

Sequence Structure ◽

Amino Acid Similarity

Download Full-text

Alternative approach to protein structure prediction based on sequential similarity of physical properties

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1504806112 ◽

2015 ◽

Vol 112 (16) ◽

pp. 5029-5032 ◽

Cited By ~ 10

Author(s):

Yi He ◽

S. Rackovsky ◽

Yanping Yin ◽

Harold A. Scheraga

Keyword(s):

Amino Acid ◽

Physical Properties ◽

Protein Sequence ◽

Structure Prediction ◽

Sequence Similarity ◽

Physical Property ◽

Sequence Matching ◽

Alternative Approach ◽

The Relationship ◽

Better Than

The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.

Download Full-text

Amino acid similarity matrix for homology modeling derived from structural alignment and optimized by the Monte Carlo method

Journal of Molecular Graphics and Modelling ◽

10.1016/s1093-3263(98)80002-8 ◽

1998 ◽

Vol 16 (4-6) ◽

pp. 178-189 ◽

Cited By ~ 10

Author(s):

Koji Ogata ◽

Masanori Ohya ◽

Hideaki Umeyama

Keyword(s):

Monte Carlo ◽

Amino Acid ◽

Monte Carlo Method ◽

Homology Modeling ◽

Structural Alignment ◽

Similarity Matrix ◽

Amino Acid Similarity ◽

The Monte Carlo Method

Download Full-text

Derivation of an amino acid similarity matrix for peptide:MHC binding and its application as a Bayesian prior

BMC Bioinformatics ◽

10.1186/1471-2105-10-394 ◽

2009 ◽

Vol 10 (1) ◽

pp. 394 ◽

Cited By ~ 96

Author(s):

Yohan Kim ◽

John Sidney ◽

Clemencia Pinilla ◽

Alessandro Sette ◽

Bjoern Peters

Keyword(s):

Amino Acid ◽

Similarity Matrix ◽

Amino Acid Similarity ◽

Bayesian Prior

Download Full-text

A structural homology approach for computational protein design with flexible backbone

Bioinformatics ◽

10.1093/bioinformatics/bty975 ◽

2018 ◽

Vol 35 (14) ◽

pp. 2418-2426 ◽

Cited By ~ 2

Author(s):

David Simoncini ◽

Kam Y J Zhang ◽

Thomas Schiex ◽

Sophie Barbe

Keyword(s):

Amino Acid ◽

Protein Design ◽

Protein Sequence ◽

Critical Role ◽

Protein Structures ◽

Amino Acid Sequences ◽

Computational Protein Design ◽

Supplementary Information ◽

Structural Homology ◽

Homologous Proteins

Abstract Motivation Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. Results We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. Availability and implementation Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

The Amino Acid Alphabet and the Architecture of the Protein Sequence-Structure Map. I. Binary Alphabets

PLoS Computational Biology ◽

10.1371/journal.pcbi.1003946 ◽

2014 ◽

Vol 10 (12) ◽

pp. e1003946 ◽

Cited By ~ 3

Author(s):

Evandro Ferrada

Keyword(s):

Amino Acid ◽

Protein Sequence ◽

Sequence Structure ◽

Amino Acid Alphabet

Download Full-text

A new substitution matrix for protein sequence searches based on contact frequencies in protein structures

Protein Engineering Design and Selection ◽

10.1093/protein/6.3.267 ◽

1993 ◽

Vol 6 (3) ◽

pp. 267-278 ◽

Cited By ~ 52

Author(s):

Sanzo Miyazawa ◽

Robert L. Jernigan

Keyword(s):

Protein Sequence ◽

Protein Structures ◽

Substitution Matrix

Download Full-text

Protein sequence alignment with family-specific amino acid similarity matrices

BMC Research Notes ◽

10.1186/1756-0500-4-296 ◽

2011 ◽

Vol 4 (1) ◽

Cited By ~ 7

Author(s):

Igor B Kuznetsov

Keyword(s):

Amino Acid ◽

Sequence Alignment ◽

Protein Sequence ◽

Amino Acid Similarity ◽

Specific Amino Acid ◽

Protein Sequence Alignment ◽

Similarity Matrices

Download Full-text

Novel Dendritic Kinesin Sorting Identified by Different Process Targeting of Two Related Kinesins: KIF21A and KIF21B

The Journal of Cell Biology ◽

10.1083/jcb.145.3.469 ◽

1999 ◽

Vol 145 (3) ◽

pp. 469-479 ◽

Cited By ~ 97

Author(s):

Joseph R. Marszalek ◽

Joshua A. Weiner ◽

Samuel J. Farlow ◽

Jerold Chun ◽

Lawrence S.B. Goldstein

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Motor Activity ◽

Cell Body ◽

Sequence Similarity ◽

Motor Domain ◽

Amino Acid Sequence Similarity ◽

Amino Acid Similarity ◽

Cellular Components ◽

Insight Into

Neurons use kinesin and dynein microtubule-dependent motor proteins to transport essential cellular components along axonal and dendritic microtubules. In a search for new kinesin-like proteins, we identified two neuronally enriched mouse kinesins that provide insight into a unique intracellular kinesin targeting mechanism in neurons. KIF21A and KIF21B share colinear amino acid similarity to each other, but not to any previously identified kinesins outside of the motor domain. Each protein also contains a domain of seven WD-40 repeats, which may be involved in binding to cargoes. Despite the amino acid sequence similarity between KIF21A and KIF21B, these proteins localize differently to dendrites and axons. KIF21A protein is localized throughout neurons, while KIF21B protein is highly enriched in dendrites. The plus end-directed motor activity of KIF21B and its enrichment in dendrites indicate that models suggesting that minus end-directed motor activity is sufficient for dendrite specific motor localization are inadequate. We suggest that a novel kinesin sorting mechanism is used by neurons to localize KIF21B protein to dendrites since its mRNA is restricted to the cell body.

Download Full-text

Three classes of tetrahydrobiopterin-dependent enzymes

Pteridines ◽

10.1515/pterid-2013-0003 ◽

2013 ◽

Vol 24 (1) ◽

pp. 7-11

Author(s):

Ernst R. Werner

Keyword(s):

Nitric Oxide ◽

Amino Acid ◽

Protein Sequence ◽

Aromatic Amino Acid ◽

Phenylalanine Hydroxylase ◽

Current Knowledge ◽

Sequence Similarity ◽

Protein Sequences ◽

Nitric Oxide Synthases ◽

Aromatic Amino Acid Hydroxylases

AbstractCurrent knowledge distinguishes three classes of tetrahydrobiopterin-dependent enzymes as based on protein sequence similarity. These three protein sequence clusters hydroxylate three types of substrate atoms and use three different forms of iron for catalysis. The first class to be discovered was the aromatic amino acid hydroxylases, which, in mammals, include phenylalanine hydroxylase, tyrosine hydroxylase, and two isoforms of tryptophan hydroxylases. The protein sequences of these tetrahydrobiopterin-dependent aromatic amino acid hydroxylases are significantly similar, and all mammalian aromatic amino acid hydroxylases require a non-heme-bound iron atom in the active site of the enzyme for catalysis. The second classes of tetrahydrobiopterin-dependent enzymes to be characterized were the nitric oxide synthases, which in mammals occur as three isoforms. Nitric oxide synthase protein sequences form a separate cluster of homologous sequences with no similarity to aromatic amino acid hydroxylase protein sequences. In contrast to aromatic amino acid hydroxylases, nitric oxide synthases require a heme-bound iron for catalysis. The alkylglycerol monooxygenase protein sequence was the most recent to be characterized. This sequence shares no similarity with aromatic amino acid hydroxylases and nitric oxide synthases. Motifs contained in the alkylglycerol monooxygenase protein sequence suggest that this enzyme may use a di-iron center for catalysis.

Download Full-text