protein sequence profile
Recently Published Documents


TOTAL DOCUMENTS

6
(FIVE YEARS 1)

H-INDEX

3
(FIVE YEARS 0)

2021 ◽  
Author(s):  
Fatemeh Zare-Mirakabad ◽  
Armin Behjati ◽  
Seyed Shahriar Arab ◽  
Abbas Nowzari-Dalini

Protein sequences can be viewed as a language; therefore, we benefit from using the models initially developed for natural languages such as transformers. ProtAlbert is one of the best pre-trained transformers on protein sequences, and its efficiency enables us to run the model on longer sequences with less computation power while having similar performance with the other pre-trained transformers. This paper includes two main parts: transformer analysis and profile prediction. In the first part, we propose five algorithms to assess the attention heads in different layers of ProtAlbert for five protein characteristics, nearest-neighbor interactions, type of amino acids, biochemical and biophysical properties of amino acids, protein secondary structure, and protein tertiary structure. These algorithms are performed on 55 proteins extracted from CASP13 and three case study proteins whose sequences, experimental tertiary structures, and HSSP profiles are available. This assessment shows that although the model is only pre-trained on protein sequences, attention heads in the layers of ProtAlbert are representative of some protein family characteristics. This conclusion leads to the second part of our work. We propose an algorithm called PA_SPP for protein sequence profile prediction by pre-trained ProtAlbert using masked-language modeling. PA_SPP algorithm can help the researchers to predict an HSSP profile while there are no similar sequences to a query sequence in the database for making the HSSP profile.


2019 ◽  
Vol 60 (1) ◽  
pp. 391-399 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Lihua Lin ◽  
Zifeng Liu ◽  
Xun Liu ◽  
...  

2019 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Zifeng Liu ◽  
Xun Liu ◽  
Yutian Chong ◽  
...  

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.


Sign in / Sign up

Export Citation Format

Share Document