New opportunities for protease ligand-binding site comparisons using SitesBase

2007 ◽  
Vol 35 (3) ◽  
pp. 561-565 ◽  
Author(s):  
N.D. Gold ◽  
K. Deville ◽  
R.M. Jackson

The rapid expansion of structural information for protein ligand-binding sites is potentially an important source of information in structure-based drug design and in understanding ligand cross-reactivity and toxicity. We have developed SitesBase, a comprehensive database of ligand-binding sites extracted automatically from the Macromolecular Structure Database. SitesBase is an easily accessible database which is simple to use and holds pre-calculated information about structural similarities between known ligand-binding sites. These similarities are presented to the wider community enabling binding-site comparisons for therapeutically interesting protein families, such as the proteases and for new proteins to enable the discovery of interesting new structure–function relationships. The database is available from http://www.modelling.leeds.ac.uk/sb/.

2021 ◽  
Author(s):  
Rishal Aggarwal ◽  
Akash Gupta ◽  
Vineeth Chelur ◽  
C. V. Jawahar ◽  
U. Deva Priyakumar

<div> A structure-based drug design pipeline involves the development of potential drug molecules or ligands that form stable complexes with a given receptor at its binding site. A prerequisite to this is finding druggable and functionally relevant binding sites on the 3D structure of the protein. Although several methods for detecting binding sites have been developed beforehand, a majority of them surprisingly fail in the identification and ranking of binding sites accurately. The rapid adoption and success of deep learning algorithms in various sections of structural biology beckons the usage of such algorithms for accurate binding site detection. As a combination of geometry based software and deep learning, we report a novel framework, DeepPocket that utilises 3D convolutional neural networks for the rescoring of pockets identified by Fpocket and further segments these identified cavities on the protein surface. Apart from this, we also propose another dataset SC6K containing protein structures submitted in the Protein Data Bank (PDB) from January 2018 till February 2020 for ligand binding site (LBS) detection. DeepPocket's results on various binding site datasets and SC6K highlights its better performance over current state-of-the-art methods and good generalization ability over novel structures. </div><div><br></div>


2016 ◽  
Vol 82 (9) ◽  
pp. 2819-2832 ◽  
Author(s):  
Rongsui Gao ◽  
Jingxia Lin ◽  
Han Zhang ◽  
Youjun Feng

ABSTRACTRecently, our group along with others reported that theVibrioFadR regulatory protein is unusual in that, unlike the prototypicalfadRproduct ofEscherichia coli, which has only one ligand-binding site,VibrioFadR has two ligand-binding sites and represents a new mechanism for fatty acid sensing. The promoter region of thevc2105gene, encoding a putative thioesterase, was mapped, and a putative FadR-binding site (AA CTG GTA AGA GCA CTT) was proposed. Different versions of the FadR regulatory proteins were prepared and purified to homogeneity. Both electrophoretic mobility shift assay (EMSA) and surface plasmon resonance (SPR) determined the direct interaction of thevc2105gene with FadR proteins of various origins. Further, EMSAs illustrated that the addition of long-chain acyl-coenzyme A (CoA) species efficiently dissociates thevc2105promoter from the FadR regulator. The expression level of theVibrio cholerae vc2105gene was elevated 2- to 3-fold in afadRnull mutant strain, validating that FadR is a repressor for thevc2105gene. The β-galactosidase activity of avc2105-lacZtranscriptional fusion was increased over 2-fold upon supplementation of growth medium with oleic acid. Unlike thefadDgene, a member of theVibrio fadregulon, the VC2105 protein played no role in bacterial growth and virulence-associated gene expression ofctxAB(cholera toxin A/B) andtcpA(toxin coregulated pilus A). Given that the transcriptional regulation ofvc2105fits the criteria for fatty acid degradation (fad) genes, we suggested that it is a new member of theVibrio fadregulon.IMPORTANCETheVibrioFadR regulator is unusual in that it has two ligand-binding sites. Different versions of the FadR regulatory proteins were prepared and characterizedin vitroandin vivo. An auxiliaryfadgene (vc2105) fromVibriowas proposed that encodes a putative thioesterase and has a predicted FadR-binding site (AAC TGG TA A GAG CAC TT). The function of this putative binding site was proved using both EMSA and SPR. Furtherin vitroandin vivoexperiments revealed that theVibrioFadR is a repressor for thevc2105gene. UnlikefadD, a member of theVibrio fadregulon, VC2105 played no role in bacterial growth and expression of the two virulence-associated genes (ctxABandtcpA). Therefore, since transcriptional regulation ofvc2105fits the criteria forfadgenes, it seems likely thatvc2105acts as a new auxiliary member of theVibrio fadregulon.


2006 ◽  
Vol 396 (3) ◽  
pp. 469-477 ◽  
Author(s):  
Wei-I Chou ◽  
Tun-Wen Pai ◽  
Shi-Hwei Liu ◽  
Bor-Kai Hsiung ◽  
Margaret D.-T. Chang

The starch-hydrolysing enzyme GA (glucoamylase) from Rhizopus oryzae is a commonly used glycoside hydrolase in industry. It consists of a C-terminal catalytic domain and an N-terminal starch-binding domain, which belong to the CBM21 (carbohydrate-binding module, family 21). In the present study, a molecular model of CBM21 from R. oryzae GA (RoGACBM21) was constructed according to PSSC (progressive secondary structure correlation), modified structure-based sequence alignment, and site-directed mutagenesis was used to identify and characterize potential ligand-binding sites. Our model suggests that RoGACBM21 contains two ligand-binding sites, with Tyr32 and Tyr67 grouped into site I, and Trp47, Tyr83 and Tyr93 grouped into site II. The involvement of these aromatic residues has been validated using chemical modification, UV difference spectroscopy studies, and both qualitative and quantitative binding assays on a series of RoGACBM21 mutants. Our results further reveal that binding sites I and II play distinct roles in ligand binding, the former not only is involved in binding insoluble starch, but also facilitates the binding of RoGACBM21 to long-chain soluble polysaccharides, whereas the latter serves as the major binding site mediating the binding of both soluble polysaccharide and insoluble ligands. In the present study we have for the first time demonstrated that the key ligand-binding residues of RoGACBM21 can be identified and characterized by a combination of novel bioinformatics methodologies in the absence of resolved three-dimensional structural information.


2018 ◽  
Vol 14 (2) ◽  
Author(s):  
Daniele Toti ◽  
Gabriele Macari ◽  
Fabio Polticelli

Abstract After the onset of the genomic era, the detection of ligand binding sites in proteins has emerged over the last few years as a powerful tool for protein function prediction. Several approaches, both sequence and structure based, have been developed, but the full potential of the corresponding tools has not been exploited yet. Here, we describe the development and classification of a large, almost exhaustive, collection of protein-ligand binding sites to be used, in conjunction with the Ligand Binding Site Recognition Application Web Application developed in our laboratory, as an alternative to virtual screening through molecular docking simulations to identify novel lead compounds for known targets. Ligand binding sites derived from the Protein Data Bank have been clustered according to ligand similarity, and given a known ligand, the binding mode of related ligands to the same target can be predicted. The collection of ligand binding sites contains more than 200,000 sites corresponding to more than 20,000 different ligands. Furthermore, the ligand binding sites of all Food and Drug Administration-approved drugs have been classified as well, allowing to investigate the possible binding of each of them (and related compounds) to a given target for drug repurposing and redesign initiatives. Sample usage cases are also described to demonstrate the effectiveness of this approach.


Genes ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 965 ◽  
Author(s):  
Ziqi Zhao ◽  
Yonghong Xu ◽  
Yong Zhao

The prediction of protein–ligand binding sites is important in drug discovery and drug design. Protein–ligand binding site prediction computational methods are inexpensive and fast compared with experimental methods. This paper proposes a new computational method, SXGBsite, which includes the synthetic minority over-sampling technique (SMOTE) and the Extreme Gradient Boosting (XGBoost). SXGBsite uses the position-specific scoring matrix discrete cosine transform (PSSM-DCT) and predicted solvent accessibility (PSA) to extract features containing sequence information. A new balanced dataset was generated by SMOTE to improve classifier performance, and a prediction model was constructed using XGBoost. The parallel computing and regularization techniques enabled high-quality and fast predictions and mitigated overfitting caused by SMOTE. An evaluation using 12 different types of ligand binding site independent test sets showed that SXGBsite performs similarly to the existing methods on eight of the independent test sets with a faster computation time. SXGBsite may be applied as a complement to biological experiments.


2013 ◽  
Vol 2013 ◽  
pp. 1-7 ◽  
Author(s):  
Che-Lun Hung ◽  
Guan-Jie Hua

The proteome-wide analysis of protein-ligand binding sites and their interactions with ligands is important in structure-based drug design and in understanding ligand cross reactivity and toxicity. The well-known and commonly used software, SMAP, has been designed for 3D ligand binding site comparison and similarity searching of a structural proteome. SMAP can also predict drug side effects and reassign existing drugs to new indications. However, the computing scale of SMAP is limited. We have developed a high availability, high performance system that expands the comparison scale of SMAP. This cloud computing service, called Cloud-PLBS, combines the SMAP and Hadoop frameworks and is deployed on a virtual cloud computing platform. To handle the vast amount of experimental data on protein-ligand binding site pairs, Cloud-PLBS exploits the MapReduce paradigm as a management and parallelizing tool. Cloud-PLBS provides a web portal and scalability through which biologists can address a wide range of computer-intensive questions in biology and drug discovery.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i726-i734
Author(s):  
Charles A Santana ◽  
Sabrina de A Silveira ◽  
João P A Moraes ◽  
Sandro C Izidoro ◽  
Raquel C de Melo-Minardi ◽  
...  

Abstract Motivation The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost. Results We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average. Availability and implementation The source code and datasets are available at https://github.com/charles-abreu/GRaSP. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 17 (11) ◽  
pp. e1009620
Author(s):  
Xingjie Pan ◽  
Tanja Kortemme

A major challenge in designing proteins de novo to bind user-defined ligands with high affinity is finding backbones structures into which a new binding site geometry can be engineered with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. Each matching step involves engineering new binding site residues into each protein “scaffold”, which is distinct from the problem of comparing already existing binding pockets. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.


2021 ◽  
Author(s):  
Xingjie Pan ◽  
Tanja Kortemme

AbstractA major challenge in designing proteins de novo to bind user-defined ligands with high specificity and affinity is finding backbones structures that can accommodate a desired binding site geometry with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place (“match”) these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions.Author summaryDe novo design of proteins that can bind to novel and highly diverse user-defined small molecule ligands could have broad biomedical and synthetic biology applications. Because ligand binding site geometries need to be accommodated by protein backbone scaffolds at high accuracy, the diversity of scaffolds is a major limitation for designing new ligand binding functions. Advances in computational protein structure design methods have significantly increased the number of accessible stable scaffold structures. Understanding how many new ligand binding sites can be accommodated by the de novo scaffolds is important for designing novel ligand binding proteins. To answer this question, we constructed a large library of ligand binding sites from the Protein Data Bank (PDB). We tested the number of ligand binding sites that can be accommodated by de novo scaffolds and naturally existing scaffolds with same fold topologies. The results showed that de novo scaffolds significantly expanded the ligand binding space of their respective fold topologies. We also identified factors that affect difficulties of binding site accommodation, as well as the relationship between the number of scaffolds and the accessible ligand binding site space. We believe our findings will benefit future method development and applications of ligand binding protein design.


Sign in / Sign up

Export Citation Format

Share Document