scholarly journals MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information

2008 ◽  
Vol 72 (2) ◽  
pp. 547-556 ◽  
Author(s):  
Sitao Wu ◽  
Yang Zhang
2021 ◽  
Author(s):  
Fatemeh Zare-Mirakabad ◽  
Armin Behjati ◽  
Seyed Shahriar Arab ◽  
Abbas Nowzari-Dalini

Protein sequences can be viewed as a language; therefore, we benefit from using the models initially developed for natural languages such as transformers. ProtAlbert is one of the best pre-trained transformers on protein sequences, and its efficiency enables us to run the model on longer sequences with less computation power while having similar performance with the other pre-trained transformers. This paper includes two main parts: transformer analysis and profile prediction. In the first part, we propose five algorithms to assess the attention heads in different layers of ProtAlbert for five protein characteristics, nearest-neighbor interactions, type of amino acids, biochemical and biophysical properties of amino acids, protein secondary structure, and protein tertiary structure. These algorithms are performed on 55 proteins extracted from CASP13 and three case study proteins whose sequences, experimental tertiary structures, and HSSP profiles are available. This assessment shows that although the model is only pre-trained on protein sequences, attention heads in the layers of ProtAlbert are representative of some protein family characteristics. This conclusion leads to the second part of our work. We propose an algorithm called PA_SPP for protein sequence profile prediction by pre-trained ProtAlbert using masked-language modeling. PA_SPP algorithm can help the researchers to predict an HSSP profile while there are no similar sequences to a query sequence in the database for making the HSSP profile.


2019 ◽  
Vol 60 (1) ◽  
pp. 391-399 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Lihua Lin ◽  
Zifeng Liu ◽  
Xun Liu ◽  
...  

2019 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Zifeng Liu ◽  
Xun Liu ◽  
Yutian Chong ◽  
...  

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.


2020 ◽  
Author(s):  
Maximilia F. de Souza Degenhardt ◽  
Phelipe A. M. Vitale ◽  
Layara A. Abiko ◽  
Martin Zacharias ◽  
Michael Sattler ◽  
...  

ABSTRACTNa+/Ca2+ exchangers (NCX) are secondary active transporters that couple the translocation of Na+ with the transport of Ca2+ in the opposite direction. The exchanger is an essential Ca2+ extrusion mechanism in excitable cells. It consists of a transmembrane domain and a large intracellular loop that contains two Ca2+-binding domains, CBD1 and CBD2. The two CBDs are adjacent to each other and form a two-domain Ca2+-sensor called CBD12. Binding of intracellular Ca2+ to CBD12 activates the NCX but inhibits the Na+/Ca2+ exchanger of Drosophila, CALX. NMR spectroscopy and SAXS studies showed that CALX and NCX CBD12 constructs display significant inter-domain flexibility in the Apo state, but assume rigid inter-domain arrangements in the Ca2+-bound state. However, detailed structure information on CBD12 in the Apo state is missing. Structural characterization of proteins formed by two or more domains connected by flexible linkers is notoriously challenging and requires the combination of orthogonal information from multiple sources. As an attempt to characterize the conformational ensemble of CALX-CBD12 in the Apo state, we applied molecular dynamics (MD) simulations, NMR (1H-15N RDCs) and Small-Angle X-Ray Scattering (SAXS) data in a combined modelling strategy that generated atomistic information on the most representative conformations. This joint approach demonstrated that CALX-CBD12 preferentially samples closed conformations, while the wide-open inter-domain arrangement characteristic of the Ca2+-bound state is less frequently sampled. These results are consistent with the view that Ca2+ binding shifts the CBD12 conformational ensemble towards extended conformers, which could be a key step in the Na+/Ca2+ exchangers’ allosteric regulation mechanism. The present strategy, combining MD with NMR and SAXS, provides a powerful approach to select representative structures from ensembles of conformations, which could be applied to other flexible multi-domain systems.SIGNIFICANCEThe conformational ensemble of CALX-CBD12, the main Ca2+-sensor of Drosophila’s Na+/Ca2+ exchanger, was characterized by a combination of MD simulations with SAXS and NMR data using the EOM approach. This analysis showed that this two-domain construct experiences opening-closing motions, providing molecular information about CALX-CBD12 in the Apo state. Ca2+-binding shifts the conformational ensemble towards extended conformers. These findings are consistent with a model according to which Ca2+ modulation of CBD12 plasticity is a key step in the Ca2+-regulation mechanism of the full-length exchanger.


1998 ◽  
Vol 7 (12) ◽  
pp. 2499-2510 ◽  
Author(s):  
Lihua Yu ◽  
James V. White ◽  
Temple F. Smith

2021 ◽  
Author(s):  
Emidio Capriotti ◽  
Piero Fariselli

Abstract Evolutionary information is the primary tool for detecting functional conservation in nucleic acid and protein. This information has been extensively used to predict structure, interactions and functions in macromolecules. Pathogenicity prediction models rely on multiple sequence alignment information at different levels. However, most accurate genome-wide variant deleteriousness ranking algorithms consider different features to assess the impact of variants. Here, we analyze three different ways of extracting evolutionary information from sequence alignments in the context of pathogenicity predictions at DNA and protein levels. We showed that protein sequence-based information is slightly more informative in the annotation of Clinvar missense variants than those obtained at the DNA level. Furthermore, to achieve the performance of state-of-the-art methods, such as CADD, the conservation of reference and variant, encoded as frequencies of reference/alternate alleles or wild-type/mutant residues, should be included. Our results on a large set of missense variants show that a basic method based on three input features derived from the protein sequence profile performs similarly to the CADD algorithm which uses hundreds of genomic features. This observation indicates that for missense variants, evolutionary information, when properly encoded, plays the primary role in ranking pathogenicity.


Sign in / Sign up

Export Citation Format

Share Document