scholarly journals To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

2019 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Zifeng Liu ◽  
Xun Liu ◽  
Yutian Chong ◽  
...  

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.

2019 ◽  
Vol 60 (1) ◽  
pp. 391-399 ◽  
Author(s):  
Sheng Chen ◽  
Zhe Sun ◽  
Lihua Lin ◽  
Zifeng Liu ◽  
Xun Liu ◽  
...  

2021 ◽  
Author(s):  
Fatemeh Zare-Mirakabad ◽  
Armin Behjati ◽  
Seyed Shahriar Arab ◽  
Abbas Nowzari-Dalini

Protein sequences can be viewed as a language; therefore, we benefit from using the models initially developed for natural languages such as transformers. ProtAlbert is one of the best pre-trained transformers on protein sequences, and its efficiency enables us to run the model on longer sequences with less computation power while having similar performance with the other pre-trained transformers. This paper includes two main parts: transformer analysis and profile prediction. In the first part, we propose five algorithms to assess the attention heads in different layers of ProtAlbert for five protein characteristics, nearest-neighbor interactions, type of amino acids, biochemical and biophysical properties of amino acids, protein secondary structure, and protein tertiary structure. These algorithms are performed on 55 proteins extracted from CASP13 and three case study proteins whose sequences, experimental tertiary structures, and HSSP profiles are available. This assessment shows that although the model is only pre-trained on protein sequences, attention heads in the layers of ProtAlbert are representative of some protein family characteristics. This conclusion leads to the second part of our work. We propose an algorithm called PA_SPP for protein sequence profile prediction by pre-trained ProtAlbert using masked-language modeling. PA_SPP algorithm can help the researchers to predict an HSSP profile while there are no similar sequences to a query sequence in the database for making the HSSP profile.


Life ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1135
Author(s):  
Shunya Kashiwagi ◽  
Kengo Sato ◽  
Yasubumi Sakakibara

Protein–RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequences and structures involved in PRIs is important for unraveling such processes. Because of the expensive and time-consuming techniques required for experimental determination of complex protein–RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue–base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require the identification of 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Here, we propose a prediction method for predicting residue–base contacts between proteins and RNAs using only sequence information and structural information predicted from sequences. The method can be applied to any protein–RNA pair, even when rich information such as its 3D structure, is not available. In this method, residue–base contact prediction is formalized as an integer programming problem. We predict a residue–base contact map that maximizes a scoring function based on sequence-based features such as k-mers of sequences and the predicted secondary structure. The scoring function is trained using a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.


2015 ◽  
Author(s):  
Kengo Sato ◽  
Shunya Kashiwagi ◽  
Yasubumi Sakakibara

Motivation: Protein-RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequence and structure in PRIs is important for understanding those processes. Due to the expensive and time-consuming processes required for experimental determination of complex protein-RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue-base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Results: We propose a prediction method for residue-base contacts between proteins and RNAs using only sequence information and structural information predicted from only sequences. The method can be applied to any protein-RNA pair, even when rich information such as 3D structure is not available. Residue-base contact prediction is formalized as an integer programming problem. We predict a residue-base contact map that maximizes a scoring function based on sequence-based features such as k-mer of sequences and predicted secondary structure. The scoring function is trained by a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.


2021 ◽  
Vol 13 (12) ◽  
pp. 2255
Author(s):  
Matteo Pardini ◽  
Victor Cazcarra-Bes ◽  
Konstantinos Papathanassiou

Synthetic Aperture Radar (SAR) measurements are unique for mapping forest 3D structure and its changes in time. Tomographic SAR (TomoSAR) configurations exploit this potential by reconstructing the 3D radar reflectivity. The frequency of the SAR measurements is one of the main parameters determining the information content of the reconstructed reflectivity in terms of penetration and sensitivity to the individual vegetation elements. This paper attempts to review and characterize the structural information content of L-band TomoSAR reflectivity reconstructions, and their potential to forest structure mapping. First, the challenges in the accurate TomoSAR reflectivity reconstruction of volume scatterers (which are expected to dominate at L-band) and to extract physical structure information from the reconstructed reflectivity is addressed. Then, the L-band penetration capability is directly evaluated by means of the estimation performance of the sub-canopy ground topography. The information content of the reconstructed reflectivity is then evaluated in terms of complementary structure indices. Finally, the dependency of the TomoSAR reconstruction and of its structural information to both the TomoSAR acquisition geometry and the temporal change of the reflectivity that may occur in the time between the TomoSAR measurements in repeat-pass or bistatic configurations is evaluated. The analysis is supported by experimental results obtained by processing airborne acquisitions performed over temperate forest sites close to the city of Traunstein in the south of Germany.


Author(s):  
Hriday K. Basak ◽  
Soumen Saha ◽  
Joydeep Ghosh ◽  
Uttam Paswan ◽  
Sujoy Karmakar ◽  
...  

Background: Treatment of the Covid-19 pandemic caused by the highly contagious and pathogenic SARS-CoV-2 is a global menace. Day by day this pandemic is getting worse. Doctors, Scientists and Researchers across the world are urgently scrambling for a cure for novel corona virus and continuously working at break neck speed to develop vaccine or drugs. But to date, there are no specific drugs or vaccine available in the market to cope up the virus. Objective: The present study helps us to elucidate 3D structures of SARS-CoV-2 proteins and also to identify best natural compounds as potential inhibitors against COVID-19. Methods: The 3D structures of the proteins were constructed using Modeller 9.16 modeling tool. Modelled proteins were validated with PROCHECK by Ramachandran plot analysis. In this study a small library of natural compounds (fifty compounds) was docked to the ACE2 binding site of the modelled surface glycoprotein of SARS-CoV-2 using Auto Dock Vina to repurpose these inhibitors for SARS-CoV-2. Conceptual density functional theory calculations of best eight compounds had been performed by Gaussian-09. Geometry optimizations for these molecules were done at M06-2X/ def2-TZVP level of theory. ADME parameters, pharmacokinetic properties and drug likeliness of the compounds were analyzed in the swissADME website. Results: In this study we analysed the sequences of surface glycoprotein, nucleocapsid phosphoprotein and envelope protein obtained from different parts of the globe. We have modelled all the different sequences of surface glycoprotein and envelop protein in order to derive 3D structure of a molecular target which is essential for the development of therapeutics. Different electronic properties of the inhibitors have been calculated using DFT through M06-2X functional with def2-TZVP basis set. Docking result at the hACE2 binding site of all modelled surface glycoproteins of SARS-CoV-2 showed that all the eight inhibitors (Actinomycin D, avellanin C, ichangin, kanglemycin A, obacunone, ursolic acid, ansamiotocin P-3 and isomitomycin A) studied here many folds better compared to hydroxychloroquine which has been found to be effective to treat patients suffering fromCOVID-19 pandemic. All the inhibitors meet most of criteria of drug likeness assessment. Conclusion: We will expect that eight compounds (Actinomycin D, avellanin C, ichangin, kanglemycin A, obacunone, ursolic acid, ansamiotocin P-3 and isomitomycin A) can be used as potential inhibitors against SARS-CoV-2.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Seonwoo Min ◽  
Seunghyun Park ◽  
Siwon Kim ◽  
Hyun-Soo Choi ◽  
Byunghan Lee ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document