scholarly journals SeqStruct: A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Contacts Yields Sequence-Structure Congruence

2018 ◽  
Author(s):  
Kejue Jia ◽  
Robert L. Jernigan

SUMMARYProtein sequence matching does not properly account for some well-known features of protein structures: surface residues being more variable than core residues, the high packing densities in globular proteins, and does not yield good matches of sequences of many proteins known to be close structural relatives. There are now abundant protein sequences and structures to enable major improvements to sequence matching. Here, we utilize structural frameworks to mount the observed correlated sequences to identify the most important correlated parts. The rationale is that protein structures provide the important physical framework for improving sequence matching. Combining the sequence and structure data in this way leads to a simple amino acid substitution matrix that can be readily incorporated into any sequence matching. This enables the incorporation of allosteric information into sequence matching and transforms it effectively from a 1-D to a 3-D procedure. The results from testing in over 3,000 sequence matches demonstrate a 37% gain in sequence similarity and a loss of 26% of the gaps when compared with the use of BLOSUM62. And, importantly there are major gains in the specificity of sequence matching across diverse proteins. Specifically, all known cases where protein structures match but sequences do not match well are resolved.

2015 ◽  
Vol 112 (16) ◽  
pp. 5029-5032 ◽  
Author(s):  
Yi He ◽  
S. Rackovsky ◽  
Yanping Yin ◽  
Harold A. Scheraga

The relationship between protein sequence and structure arises entirely from amino acid physical properties. An alternative method is therefore proposed to identify homologs in which residue equivalence is based exclusively on the pairwise physical property similarities of sequences. This approach, the property factor method (PFM), is entirely different from those in current use. A comparison is made between our method and PSI BLAST. We demonstrate that traditionally defined sequence similarity can be very low for pairs of sequences (which therefore cannot be identified using PSI BLAST), but similarity of physical property distributions results in almost identical 3D structures. The performance of PFM is shown to be better than that of PSI BLAST when sequence matching is comparable, based on a comparison using targets from CASP10 (89 targets) and CASP11 (51 targets). It is also shown that PFM outperforms PSI BLAST in informatically challenging targets.


2009 ◽  
Vol 10 (1) ◽  
pp. 394 ◽  
Author(s):  
Yohan Kim ◽  
John Sidney ◽  
Clemencia Pinilla ◽  
Alessandro Sette ◽  
Bjoern Peters

2018 ◽  
Vol 35 (14) ◽  
pp. 2418-2426 ◽  
Author(s):  
David Simoncini ◽  
Kam Y J Zhang ◽  
Thomas Schiex ◽  
Sophie Barbe

Abstract Motivation Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs. Results We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%. Availability and implementation Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software. Supplementary information Supplementary data are available at Bioinformatics online.


1999 ◽  
Vol 145 (3) ◽  
pp. 469-479 ◽  
Author(s):  
Joseph R. Marszalek ◽  
Joshua A. Weiner ◽  
Samuel J. Farlow ◽  
Jerold Chun ◽  
Lawrence S.B. Goldstein

Neurons use kinesin and dynein microtubule-dependent motor proteins to transport essential cellular components along axonal and dendritic microtubules. In a search for new kinesin-like proteins, we identified two neuronally enriched mouse kinesins that provide insight into a unique intracellular kinesin targeting mechanism in neurons. KIF21A and KIF21B share colinear amino acid similarity to each other, but not to any previously identified kinesins outside of the motor domain. Each protein also contains a domain of seven WD-40 repeats, which may be involved in binding to cargoes. Despite the amino acid sequence similarity between KIF21A and KIF21B, these proteins localize differently to dendrites and axons. KIF21A protein is localized throughout neurons, while KIF21B protein is highly enriched in dendrites. The plus end-directed motor activity of KIF21B and its enrichment in dendrites indicate that models suggesting that minus end-directed motor activity is sufficient for dendrite specific motor localization are inadequate. We suggest that a novel kinesin sorting mechanism is used by neurons to localize KIF21B protein to dendrites since its mRNA is restricted to the cell body.


Pteridines ◽  
2013 ◽  
Vol 24 (1) ◽  
pp. 7-11
Author(s):  
Ernst R. Werner

AbstractCurrent knowledge distinguishes three classes of tetrahydrobiopterin-dependent enzymes as based on protein sequence similarity. These three protein sequence clusters hydroxylate three types of substrate atoms and use three different forms of iron for catalysis. The first class to be discovered was the aromatic amino acid hydroxylases, which, in mammals, include phenylalanine hydroxylase, tyrosine hydroxylase, and two isoforms of tryptophan hydroxylases. The protein sequences of these tetrahydrobiopterin-dependent aromatic amino acid hydroxylases are significantly similar, and all mammalian aromatic amino acid hydroxylases require a non-heme-bound iron atom in the active site of the enzyme for catalysis. The second classes of tetrahydrobiopterin-dependent enzymes to be characterized were the nitric oxide synthases, which in mammals occur as three isoforms. Nitric oxide synthase protein sequences form a separate cluster of homologous sequences with no similarity to aromatic amino acid hydroxylase protein sequences. In contrast to aromatic amino acid hydroxylases, nitric oxide synthases require a heme-bound iron for catalysis. The alkylglycerol monooxygenase protein sequence was the most recent to be characterized. This sequence shares no similarity with aromatic amino acid hydroxylases and nitric oxide synthases. Motifs contained in the alkylglycerol monooxygenase protein sequence suggest that this enzyme may use a di-iron center for catalysis.


Sign in / Sign up

Export Citation Format

Share Document