To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Mapping Intimacies ◽

10.1101/628917 ◽

2019 ◽

Author(s):

Sheng Chen ◽

Zhe Sun ◽

Zifeng Liu ◽

Xun Liu ◽

Yutian Chong ◽

...

Keyword(s):

Protein Sequence ◽

Network Architecture ◽

Structural Information ◽

3D Structure ◽

Previous Method ◽

Image Captioning ◽

Sequence Profile ◽

Distance Map ◽

3D Structures ◽

Protein Sequence Profile

ABSTRACTProtein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based method or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2 has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one dimensional (1D) structural properties that are not sufficient to represent 3D structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances. and developed a new method (SPROF) to predict protein sequence profile based on an image captioning learning frame. To our best knowledge, this is the first method to employ 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long range information from the 2D distance map. Thus, such network architecture using 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction.

Download Full-text

To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map

Journal of Chemical Information and Modeling ◽

10.1021/acs.jcim.9b00438 ◽

2019 ◽

Vol 60 (1) ◽

pp. 391-399 ◽

Cited By ~ 3

Author(s):

Sheng Chen ◽

Zhe Sun ◽

Lihua Lin ◽

Zifeng Liu ◽

Xun Liu ◽

...

Keyword(s):

Protein Sequence ◽

Image Captioning ◽

Sequence Profile ◽

Distance Map ◽

Protein Sequence Profile

Download Full-text

Identification of ligand-binding residues using protein sequence profile alignment and query-specific support vector machine model

Analytical Biochemistry ◽

10.1016/j.ab.2020.113799 ◽

2020 ◽

Vol 604 ◽

pp. 113799

Author(s):

Jun Hu ◽

Liang Rao ◽

Xueqiang Fan ◽

Guijun Zhang

Keyword(s):

Support Vector Machine ◽

Protein Sequence ◽

Support Vector Machine Model ◽

Support Vector ◽

Sequence Profile ◽

Machine Model ◽

Specific Support ◽

Binding Residues ◽

Protein Sequence Profile ◽

Profile Alignment

Download Full-text

Protein sequence profile prediction using ProtAlbert transformer1

10.1101/2021.09.23.461475 ◽

2021 ◽

Author(s):

Fatemeh Zare-Mirakabad ◽

Armin Behjati ◽

Seyed Shahriar Arab ◽

Abbas Nowzari-Dalini

Keyword(s):

Amino Acids ◽

Protein Sequence ◽

Nearest Neighbor ◽

Tertiary Structure ◽

Query Sequence ◽

Protein Secondary Structure ◽

Protein Sequences ◽

Family Characteristics ◽

Sequence Profile ◽

Protein Sequence Profile

Protein sequences can be viewed as a language; therefore, we benefit from using the models initially developed for natural languages such as transformers. ProtAlbert is one of the best pre-trained transformers on protein sequences, and its efficiency enables us to run the model on longer sequences with less computation power while having similar performance with the other pre-trained transformers. This paper includes two main parts: transformer analysis and profile prediction. In the first part, we propose five algorithms to assess the attention heads in different layers of ProtAlbert for five protein characteristics, nearest-neighbor interactions, type of amino acids, biochemical and biophysical properties of amino acids, protein secondary structure, and protein tertiary structure. These algorithms are performed on 55 proteins extracted from CASP13 and three case study proteins whose sequences, experimental tertiary structures, and HSSP profiles are available. This assessment shows that although the model is only pre-trained on protein sequences, attention heads in the layers of ProtAlbert are representative of some protein family characteristics. This conclusion leads to the second part of our work. We propose an algorithm called PA_SPP for protein sequence profile prediction by pre-trained ProtAlbert using masked-language modeling. PA_SPP algorithm can help the researchers to predict an HSSP profile while there are no similar sequences to a query sequence in the database for making the HSSP profile.

Download Full-text

A Max-Margin Model for Predicting Residue—Base Contacts in Protein–RNA Interactions

Life ◽

10.3390/life11111135 ◽

2021 ◽

Vol 11 (11) ◽

pp. 1135

Author(s):

Shunya Kashiwagi ◽

Kengo Sato ◽

Yasubumi Sakakibara

Keyword(s):

Rna Binding ◽

Structural Information ◽

3D Structure ◽

Scoring Function ◽

Prediction Method ◽

Sequence Information ◽

Integer Programming Problem ◽

3D Structures ◽

Binding Residue ◽

Base Contact

Protein–RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequences and structures involved in PRIs is important for unraveling such processes. Because of the expensive and time-consuming techniques required for experimental determination of complex protein–RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue–base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require the identification of 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Here, we propose a prediction method for predicting residue–base contacts between proteins and RNAs using only sequence information and structural information predicted from sequences. The method can be applied to any protein–RNA pair, even when rich information such as its 3D structure, is not available. In this method, residue–base contact prediction is formalized as an integer programming problem. We predict a residue–base contact map that maximizes a scoring function based on sequence-based features such as k-mers of sequences and the predicted secondary structure. The scoring function is trained using a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.

Download Full-text

A comparison of scoring functions for protein sequence profile alignment

Bioinformatics ◽

10.1093/bioinformatics/bth090 ◽

2004 ◽

Vol 20 (8) ◽

pp. 1301-1308 ◽

Cited By ~ 76

Author(s):

R. C. Edgar ◽

K. Sjolander

Keyword(s):

Protein Sequence ◽

Scoring Functions ◽

Sequence Profile ◽

Protein Sequence Profile ◽

Profile Alignment

Download Full-text

MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.21945 ◽

2008 ◽

Vol 72 (2) ◽

pp. 547-556 ◽

Cited By ~ 269

Author(s):

Sitao Wu ◽

Yang Zhang

Keyword(s):

Protein Sequence ◽

Multiple Sources ◽

Sequence Profile ◽

Structure Information ◽

Protein Sequence Profile

Download Full-text

A max-margin model for predicting residue-base contacts in protein-RNA interactions

10.1101/022459 ◽

2015 ◽

Author(s):

Kengo Sato ◽

Shunya Kashiwagi ◽

Yasubumi Sakakibara

Keyword(s):

Rna Binding ◽

Structural Information ◽

3D Structure ◽

Scoring Function ◽

Prediction Method ◽

Sequence Information ◽

Integer Programming Problem ◽

3D Structures ◽

Binding Residue ◽

Base Contact

Motivation: Protein-RNA interactions (PRIs) are essential for many biological processes, so understanding aspects of the sequence and structure in PRIs is important for understanding those processes. Due to the expensive and time-consuming processes required for experimental determination of complex protein-RNA structures, various computational methods have been developed to predict PRIs. However, most of these methods focus on predicting only RNA-binding regions in proteins or only protein-binding motifs in RNA. Methods for predicting entire residue-base contacts in PRIs have not yet achieved sufficient accuracy. Furthermore, some of these methods require 3D structures or homologous sequences, which are not available for all protein and RNA sequences. Results: We propose a prediction method for residue-base contacts between proteins and RNAs using only sequence information and structural information predicted from only sequences. The method can be applied to any protein-RNA pair, even when rich information such as 3D structure is not available. Residue-base contact prediction is formalized as an integer programming problem. We predict a residue-base contact map that maximizes a scoring function based on sequence-based features such as k-mer of sequences and predicted secondary structure. The scoring function is trained by a max-margin framework from known PRIs with 3D structures. To verify our method, we conducted several computational experiments. The results suggest that our method, which is based on only sequence information, is comparable with RNA-binding residue prediction methods based on known binding data.

Download Full-text

TomoSAR Mapping of 3D Forest Structure: Contributions of L-Band Configurations

Remote Sensing ◽

10.3390/rs13122255 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2255

Author(s):

Matteo Pardini ◽

Victor Cazcarra-Bes ◽

Konstantinos Papathanassiou

Keyword(s):

Information Content ◽

Forest Structure ◽

Structural Information ◽

3D Structure ◽

Physical Structure ◽

Structure Mapping ◽

Structure Information ◽

Forest Sites ◽

L Band ◽

Structure Indices

Synthetic Aperture Radar (SAR) measurements are unique for mapping forest 3D structure and its changes in time. Tomographic SAR (TomoSAR) configurations exploit this potential by reconstructing the 3D radar reflectivity. The frequency of the SAR measurements is one of the main parameters determining the information content of the reconstructed reflectivity in terms of penetration and sensitivity to the individual vegetation elements. This paper attempts to review and characterize the structural information content of L-band TomoSAR reflectivity reconstructions, and their potential to forest structure mapping. First, the challenges in the accurate TomoSAR reflectivity reconstruction of volume scatterers (which are expected to dominate at L-band) and to extract physical structure information from the reconstructed reflectivity is addressed. Then, the L-band penetration capability is directly evaluated by means of the estimation performance of the sub-canopy ground topography. The information content of the reconstructed reflectivity is then evaluated in terms of complementary structure indices. Finally, the dependency of the TomoSAR reconstruction and of its structural information to both the TomoSAR acquisition geometry and the temporal change of the reflectivity that may occur in the time between the TomoSAR measurements in repeat-pass or bistatic configurations is evaluated. The analysis is supported by experimental results obtained by processing airborne acquisitions performed over temperate forest sites close to the city of Traunstein in the south of Germany.

Download Full-text

Sequence analysis, structure prediction of receptor proteins and In silico study of potential inhibitors for management of life threatening COVID-19

Letters in Drug Design & Discovery ◽

10.2174/1570180818666210804141613 ◽

2021 ◽

Vol 18 ◽

Author(s):

Hriday K. Basak ◽

Soumen Saha ◽

Joydeep Ghosh ◽

Uttam Paswan ◽

Sujoy Karmakar ◽

...

Keyword(s):

Binding Site ◽

Ursolic Acid ◽

Density Functional ◽

Actinomycin D ◽

3D Structure ◽

Natural Compounds ◽

Surface Glycoprotein ◽

Basis Set ◽

3D Structures ◽

Potential Inhibitors

Background: Treatment of the Covid-19 pandemic caused by the highly contagious and pathogenic SARS-CoV-2 is a global menace. Day by day this pandemic is getting worse. Doctors, Scientists and Researchers across the world are urgently scrambling for a cure for novel corona virus and continuously working at break neck speed to develop vaccine or drugs. But to date, there are no specific drugs or vaccine available in the market to cope up the virus. Objective: The present study helps us to elucidate 3D structures of SARS-CoV-2 proteins and also to identify best natural compounds as potential inhibitors against COVID-19. Methods: The 3D structures of the proteins were constructed using Modeller 9.16 modeling tool. Modelled proteins were validated with PROCHECK by Ramachandran plot analysis. In this study a small library of natural compounds (fifty compounds) was docked to the ACE2 binding site of the modelled surface glycoprotein of SARS-CoV-2 using Auto Dock Vina to repurpose these inhibitors for SARS-CoV-2. Conceptual density functional theory calculations of best eight compounds had been performed by Gaussian-09. Geometry optimizations for these molecules were done at M06-2X/ def2-TZVP level of theory. ADME parameters, pharmacokinetic properties and drug likeliness of the compounds were analyzed in the swissADME website. Results: In this study we analysed the sequences of surface glycoprotein, nucleocapsid phosphoprotein and envelope protein obtained from different parts of the globe. We have modelled all the different sequences of surface glycoprotein and envelop protein in order to derive 3D structure of a molecular target which is essential for the development of therapeutics. Different electronic properties of the inhibitors have been calculated using DFT through M06-2X functional with def2-TZVP basis set. Docking result at the hACE2 binding site of all modelled surface glycoproteins of SARS-CoV-2 showed that all the eight inhibitors (Actinomycin D, avellanin C, ichangin, kanglemycin A, obacunone, ursolic acid, ansamiotocin P-3 and isomitomycin A) studied here many folds better compared to hydroxychloroquine which has been found to be effective to treat patients suffering fromCOVID-19 pandemic. All the inhibitors meet most of criteria of drug likeness assessment. Conclusion: We will expect that eight compounds (Actinomycin D, avellanin C, ichangin, kanglemycin A, obacunone, ursolic acid, ansamiotocin P-3 and isomitomycin A) can be used as potential inhibitors against SARS-CoV-2.

Download Full-text

Pre-training of Deep Bidirectional Protein Sequence Representations with Structural Information

IEEE Access ◽

10.1109/access.2021.3110269 ◽

2021 ◽

pp. 1-1

Author(s):

Seonwoo Min ◽

Seunghyun Park ◽

Siwon Kim ◽

Hyun-Soo Choi ◽

Byunghan Lee ◽

...

Keyword(s):

Protein Sequence ◽

Structural Information

Download Full-text