Comprehensive review and empirical analysis of hallmarks of DNA-, RNA- and protein-binding residues in protein chains

Jian Zhang; Zhiqiang Ma; Lukasz Kurgan

doi:10.1093/bib/bbx168

The Identification of Metal Ion Ligand-Binding Residues by Adding the Reclassified Relative Solvent Accessibility

Frontiers in Genetics ◽

10.3389/fgene.2020.00214 ◽

2020 ◽

Vol 11 ◽

Author(s):

Xiuzhen Hu ◽

Zhenxing Feng ◽

Xiaojin Zhang ◽

Liu Liu ◽

Shan Wang

Keyword(s):

Ligand Binding ◽

Metal Ion ◽

Solvent Accessibility ◽

Relative Solvent Accessibility ◽

Binding Residues

Download Full-text

Quantifying sequence and structural features of protein–RNA interactions

Nucleic Acids Research ◽

10.1093/nar/gku681 ◽

2014 ◽

Vol 42 (15) ◽

pp. 10086-10098 ◽

Cited By ~ 28

Author(s):

Songling Li ◽

Kazuo Yamashita ◽

Karlou Mar Amada ◽

Daron M. Standley

Keyword(s):

Binding Sites ◽

Rna Binding ◽

Structural Information ◽

Solvent Accessibility ◽

Structural Characteristics ◽

Structural Features ◽

Residue Level ◽

Lower Sensitivity ◽

Relative Solvent Accessibility ◽

Rna Binding Sites

Abstract Increasing awareness of the importance of protein–RNA interactions has motivated many approaches to predict residue-level RNA binding sites in proteins based on sequence or structural characteristics. Sequence-based predictors are usually high in sensitivity but low in specificity; conversely structure-based predictors tend to have high specificity, but lower sensitivity. Here we quantified the contribution of both sequence- and structure-based features as indicators of RNA-binding propensity using a machine-learning approach. In order to capture structural information for proteins without a known structure, we used homology modeling to extract the relevant structural features. Several novel and modified features enhanced the accuracy of residue-level RNA-binding propensity beyond what has been reported previously, including by meta-prediction servers. These features include: hidden Markov model-based evolutionary conservation, surface deformations based on the Laplacian norm formalism, and relative solvent accessibility partitioned into backbone and side chain contributions. We constructed a web server called aaRNA that implements the proposed method and demonstrate its use in identifying putative RNA binding sites.

Download Full-text

RBscore&NBench: a high-level web server for nucleic acid binding residues prediction with a large-scale benchmarking database

Nucleic Acids Research ◽

10.1093/nar/gkw251 ◽

2016 ◽

Vol 44 (W1) ◽

pp. W562-W567 ◽

Cited By ~ 9

Author(s):

Zhichao Miao ◽

Eric Westhof

Keyword(s):

Nucleic Acid ◽

Large Scale ◽

Web Server ◽

Nucleic Acid Binding ◽

Binding Residues ◽

High Level

Download Full-text

Analysis and Prediction of RNA-Binding Residues Using Sequence, Evolutionary Conservation, and Predicted Secondary Structure and Solvent Accessibility

Current Protein and Peptide Science ◽

10.2174/138920310794109193 ◽

2010 ◽

Vol 11 (7) ◽

pp. 609-628 ◽

Cited By ~ 38

Author(s):

Tuo Zhang ◽

Hua Zhang ◽

Ke Chen ◽

Jishou Ruan ◽

Shiyi Shen ◽

...

Keyword(s):

Secondary Structure ◽

Rna Binding ◽

Solvent Accessibility ◽

Evolutionary Conservation ◽

Binding Residues

Download Full-text

FunFam protein families improve residue level molecular function prediction

10.21203/rs.2.9444/v2 ◽

2019 ◽

Author(s):

Linus Mathias Scheibenreif ◽

Maria Littmann ◽

Christine Orengo ◽

Burkhard Rost

Keyword(s):

Protein Binding ◽

Binding Site ◽

De Novo ◽

Prediction Method ◽

Fold Increase ◽

Residue Level ◽

Protein Domain ◽

Enzymatic Function ◽

Binding Residues

Abstract Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9±0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding site prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8±0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Download Full-text

FunFam protein families improve residue level molecular function prediction

10.21203/rs.2.9444/v1 ◽

2019 ◽

Author(s):

Linus Mathias Scheibenreif ◽

Maria Littmann ◽

Christine Orengo ◽

Burkhard Rost

Keyword(s):

Protein Binding ◽

Binding Site ◽

De Novo ◽

Prediction Method ◽

Fold Increase ◽

Residue Level ◽

Protein Domain ◽

Enzymatic Function ◽

Binding Residues

Abstract Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9±0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding site prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8±0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Download Full-text

SCRIBER: accurate and partner type-specific prediction of protein-binding residues from proteins sequences

Bioinformatics ◽

10.1093/bioinformatics/btz324 ◽

2019 ◽

Vol 35 (14) ◽

pp. i343-i353 ◽

Cited By ~ 10

Author(s):

Jian Zhang ◽

Lukasz Kurgan

Keyword(s):

Protein Binding ◽

Protein Interactions ◽

Rna Binding ◽

Protein Complexes ◽

Predictive Performance ◽

Protein Docking ◽

Supplementary Information ◽

Binding Residue ◽

Binding Residues ◽

The Cross

AbstractMotivationAccurate predictions of protein-binding residues (PBRs) enhances understanding of molecular-level rules governing protein–protein interactions, helps protein–protein docking and facilitates annotation of protein functions. Recent studies show that current sequence-based predictors of PBRs severely cross-predict residues that interact with other types of protein partners (e.g. RNA and DNA) as PBRs. Moreover, these methods are relatively slow, prohibiting genome-scale use.ResultsWe propose a novel, accurate and fast sequence-based predictor of PBRs that minimizes the cross-predictions. Our SCRIBER (SeleCtive pRoteIn-Binding rEsidue pRedictor) method takes advantage of three innovations: comprehensive dataset that covers multiple types of binding residues, novel types of inputs that are relevant to the prediction of PBRs, and an architecture that is tailored to reduce the cross-predictions. The dataset includes complete protein chains and offers improved coverage of binding annotations that are transferred from multiple protein–protein complexes. We utilize innovative two-layer architecture where the first layer generates a prediction of protein-binding, RNA-binding, DNA-binding and small ligand-binding residues. The second layer re-predicts PBRs by reducing overlap between PBRs and the other types of binding residues produced in the first layer. Empirical tests on an independent test dataset reveal that SCRIBER significantly outperforms current predictors and that all three innovations contribute to its high predictive performance. SCRIBER reduces cross-predictions by between 41% and 69% and our conservative estimates show that it is at least 3 times faster. We provide putative PBRs produced by SCRIBER for the entire human proteome and use these results to hypothesize that about 14% of currently known human protein domains bind proteins.Availability and implementationSCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

FunFam protein families improve residue level molecular function prediction

10.21203/rs.2.9444/v3 ◽

2019 ◽

Author(s):

Linus Mathias Scheibenreif ◽

Maria Littmann ◽

Christine Orengo ◽

Burkhard Rost

Keyword(s):

Protein Binding ◽

Binding Site ◽

De Novo ◽

Prediction Method ◽

Fold Increase ◽

Residue Level ◽

Protein Domain ◽

Enzymatic Function ◽

Binding Residues

Abstract Background The CATH database provides a hierarchical classification of protein domain structures including a sub-classification of superfamilies into functional families (FunFams). We analyzed the similarity of binding site annotations in these FunFams and incorporated FunFams into the prediction of protein binding residues. Results FunFam members agreed, on average, in 36.9±0.6% of their binding residue annotations. This constituted a 6.7-fold increase over randomly grouped proteins and a 1.2-fold increase (1.1-fold on the same dataset) over proteins with the same enzymatic function (identical Enzyme Commission, EC, number). Mapping de novo binding site prediction methods (BindPredict-CCS, BindPredict-CC) onto FunFam resulted in consensus predictions for those residues that were aligned and predicted alike (binding/non-binding) within a FunFam. This simple consensus increased the F1-score (for binding) 1.5-fold over the original prediction method. Variation of the threshold for how many proteins in the consensus prediction had to agree provided a convenient control of accuracy/precision and coverage/recall, e.g. reaching a precision as high as 60.8±0.4% for a stringent threshold. Conclusions The FunFams outperformed even the carefully curated EC numbers in terms of agreement of binding site residues. Additionally, we assume that our proof-of-principle through the prediction of protein binding residues will be relevant for many other solutions profiting from FunFams to infer functional information at the residue level.

Download Full-text