scholarly journals Transformer neural network for protein-specific de novo drug generation as a machine translation problem

2019 ◽  
Author(s):  
Daria Grechishnikova

AbstractDrug discovery for a protein target is a very laborious, long and costly process. Machine learning approaches and, in particular, deep generative networks can substantially reduce development time and costs. However, the majority of methods imply prior knowledge of protein binders, their physicochemical characteristics or the three-dimensional structure of the protein. The method proposed in this work generates novel molecules with predicted ability to bind a target protein by relying on its amino acid sequence only. We consider target-specific de novo drug design as a translational problem between the amino acid “language” and SMILES (Simplified Molecular Input Line Entry System) representation of the molecule. To tackle this problem, we apply Transformer neural network architecture, a state-of-the-art approach in sequence transduction tasks. Transformer is based on a self-attention technique, which allows the capture of long-range dependencies between items in sequence. The model generates realistic diverse compounds with structural novelty. The computed physicochemical properties and common metrics used in drug discovery fall within the plausible drug-like range of values.

Blood ◽  
2010 ◽  
Vol 116 (21) ◽  
pp. 3006-3006
Author(s):  
Joel G Turner ◽  
Thomas C Rowe ◽  
David Ostrov ◽  
Jana L Dawson ◽  
Elizabeth Ciaravino ◽  
...  

Abstract Abstract 3006 Abstract Human multiple myeloma (MM) still remains an incurable disease despite improved treatment regimens that include bortezomib, lenalidomide and thalidomide. New therapeutic targets are needed to further improve treatment outcomes. We have shown that targeting intracellular trafficking of proteins may sensitize cells to antitumor agents (Turner et al. 2009, Cancer Res, 69, 6899-905). We have previously demonstrated that topoisomerase II alpha (topo IIα) trafficking from the nucleus to the cytoplasm in myeloma cells occurs by a CRM1-dependent mechanism and resulting in drug resistance to topo II inhibitors (Engel et al. 2004, Exp Cell Res, 295, 421-31). We have also identified the nuclear export signals (NES) for topo IIα at amino acids 1017-28 (site 1) and 1054-66 (site 2) (Turner et al. 2004, J Cell Sci, 117, 3061-71). Blocking nuclear export of topo IIα with a CRM1 inhibitor or by siRNA has been shown to sensitize MM cells to topo II poisons (Turner et al. 2009, Cancer Res, 69, 6899-905). The NES amino acid sequence of topo IIα at 1017–1028 is a unique site. Even though this site conforms to the hydrophobic amino acid motif for an NES, the amino acid sequence does not occur in any other human protein. In addition, this NES is in a pocket formed by the three-dimensional structure of the topo IIα protein. These factors allow the potential for the development of drugs that will exclusively block the NES of topo IIα and not affect the export of other nuclear proteins, as occurs with other known CRM1 inhibitors. Drug resistance to topo II inhibitors occurs when topo IIα is trafficked from the nucleus to the cytoplasm where it is no longer in contact with the DNA, and thus unable to induce cell death (Valkov et al 2000, Br J Haematol, 108, 331-45, Engel et al. 2004, Exp Cell Res, 295, 421-31). We therefore hypothesize that targeting a specific NES in topo IIα is an innovative treatment approach in MM and may allow a very focused and potent combination with topo II inhibitors, possibly overcoming de novo drug resistance in this malignancy. To date, we know of no agents that target the NES of a specific protein that are being developed to treat cancer. A computer generated hybrid molecule using the known three dimensional structure of yeast topo II and the human NES sequences of topo IIα was produced. Molecules were docked in silico using the NCI small molecule database (140,000 compounds). The molecules with the highest docking scores were obtained from NCI and assayed for IC50 values and synergy with the topo II inhibitor doxorubicin. All NES site 1 molecules tested showed activity, however, none of the NES site 2 molecules exhibited any anti-neoplastic activity with or without a topo II inhibitor. CT blue (Promega) robotic cell viability assays determined that several of the site 1 inhibitors had anti-proliferative activity. The IC50 values obtained from single drug cell viability assays in low density cells revealed two site 1 inhibitors compounds with IC50 values of 4.7 (NCI-36400) and 11.1 μM (NCI-35847). None of the site 1 inhibitors affected the viability of high-density cells (IC50>100 μM). Data from apoptosis assays indicate that three of the site 1 inhibitors (NCI-36400, NCI-35847, NCI-35024) that dock to NES site 1 do significantly (p<0.05) sensitize high density MM cells to doxorubicin. Immunofluorescence microscopy revealed an increase in topo IIα in the cell nucleus of cells treated for 20 hours with the three lead site 1 inhibitors. Nuclear-cytoplasmic fractionation revealed that the NES site 1 docking molecules prevent nuclear export of topo IIα. These compounds may lead to new chemotherapeutic treatments of myeloma. Disclosures: No relevant conflicts of interest to declare.


2020 ◽  
Vol 2020 ◽  
pp. 1-12 ◽  
Author(s):  
Zhong Li ◽  
Yuele Lin ◽  
Arne Elofsson ◽  
Yuhua Yao

Residue-residue contact prediction has become an increasingly important tool for modeling the three-dimensional structure of a protein when no homologous structure is available. Ultradeep residual neural network (ResNet) has become the most popular method for making contact predictions because it captures the contextual information between residues. In this paper, we propose a novel deep neural network framework for contact prediction which combines ResNet and DenseNet. This framework uses 1D ResNet to process sequential features, and besides PSSM, SS3, and solvent accessibility, we have introduced a new feature, position-specific frequency matrix (PSFM), as an input. Using ResNet’s residual module and identity mapping, it can effectively process sequential features after which the outer concatenation function is used for sequential and pairwise features. Prediction accuracy is improved following a final processing step using the dense connection of DenseNet. The prediction accuracy of the protein contact map shows that our method is more effective than other popular methods due to the new network architecture and the added feature input.


Author(s):  
Е.В. Бражников ◽  
E.V. Brazhnikov

Conformations of about 600 looped regions (loops) in β-α- and α-β-arches of a structural motif occurring in the abCd-unit of proteins were analyzed. On the whole, 258 abCd-units with a reverse turn of the polypeptide chain (236 PDB files) and 69 abCd-units with a direct turn (65 PDB files) were selected in non-homologous proteins. Four types of arches were studied: β-α- and α-β-ones at a direct turn of the chain; β-α- and α-β-ones at a reverse turn of the chain. For each type of arches, frequencies of loops occurrence of different lengths were determined and corresponding histograms were plotted. It was found that abCd-units with loops up to three amino acid residues long occur most frequently (57 %). In β-α-arches with a direct turn of the chain, loops consisting of two amino acid residues occur most often (44 %) and in 86% cases they have the βmαβαn - conformation. They have no Gly and Pro residues, and in position β there is an Asn residue. In such type of arches, the loops of one residue (βmεαn- or βmαLαn- conformation) contain the Gly residue most frequently. α-β-Arches with a direct turn of the chain have most commonly (18 %) loops of four amino acid residues. In this case, there is no predominant conformation of the loops. In β-α-arches with a reverse turn of the chain, most common are loops of seven amino acid residues (17%), and most part of them (88 %) have the βmαLββααββαn - conformation. α-β-Arches with a reverse turn of the chain contain most frequently (32%) loops of one amino acid residue (all Gly ones) with arch conformations αmεβn or αmαLβn. The above structural analysis of the abCd-unit has useful information for prediction of the three-dimensional structure of proteins and for molecular simulation of the de novo design of protein structures.


Genetics ◽  
1995 ◽  
Vol 139 (1) ◽  
pp. 267-286 ◽  
Author(s):  
J D Fackenthal ◽  
J A Hutchens ◽  
F R Turner ◽  
E C Raff

Abstract We have determined the lesions in a number of mutant alleles of beta Tub85D, the gene that encodes the testis-specific beta 2-tubulin isoform in Drosophila melanogaster. Mutations responsible for different classes of functional phenotypes are distributed throughout the beta 2-tubulin molecule. There is a telling correlation between the degree of phylogenetic conservation of the altered residues and the number of different microtubule categories disrupted by the lesions. The majority of lesions occur at positions that are evolutionarily highly conserved in all beta-tubulins; these lesions disrupt general functions common to multiple classes of microtubules. However, a single allele B2t6 contains an amino acid substitution within an internal cluster of variable amino acids that has been identified as an isotype-defining domain in vertebrate beta-tubulins. Correspondingly, B2t6 disrupts only a subset of microtubule functions, resulting in misspecification of the morphology of the doublet microtubules of the sperm tail axoneme. We previously demonstrated that beta 3, a developmentally regulated Drosophila beta-tubulin isoform, confers the same restricted morphological phenotype in a dominant way when it is coexpressed in the testis with wild-type beta 2-tubulin. We show here by complementation analysis that beta 3 and the B2t6 product disrupt a common aspect of microtubule assembly. We therefore conclude that the amino acid sequence of the beta 2-tubulin internal variable region is required for generation of correct axoneme morphology but not for general microtubule functions. As we have previously reported, the beta 2-tubulin carboxy terminal isotype-defining domain is required for suprastructural organization of the axoneme. We demonstrate here that the beta 2 variant lacking the carboxy terminus and the B2t6 variant complement each other for mild-to-moderate meiotic defects but do not complement for proper axonemal morphology. Our results are consistent with the hypothesis drawn from comparisons of vertebrate beta-tubulins that the two isotype-defining domains interact in a three-dimensional structure in wild-type beta-tubulins. We propose that the integrity of this structure in the Drosophila testis beta 2-tubulin isoform is required for proper axoneme assembly but not necessarily for general microtubule functions. On the basis of our observations we present a model for regulation of axoneme microtubule morphology as a function of tubulin assembly kinetics.


2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Lulu Yan ◽  
Ru Shen ◽  
Zongfu Cao ◽  
Chunxiao Han ◽  
Yuxin Zhang ◽  
...  

PPP2R5D-related neurodevelopmental disorder, which is mainly caused by de novo missense variants in the PPP2R5D gene, is a rare autosomal dominant genetic disorder with about 100 patients and a total of thirteen pathogenic variants known to exist globally so far. Here, we present a 24-month-old Chinese boy with developmental delay and other common clinical characteristics of PPP2R5D-related neurodevelopmental disorder including hypotonia, macrocephaly, intellectual disability, speech impairment, and behavioral abnormality. Trio-whole exome sequencing (WES) and Sanger sequencing were performed to identify the causal gene variant. The pathogenicity of the variant was evaluated using bioinformatics tools. We identified a novel pathogenic variant in the PPP2R5D gene (c.620G>T, p.Trp207Leu). The variant is located in the variant hotspot region of this gene and is predicted to cause PPP2R5D protein dysfunction due to an increase in local hydrophobicity and unstable three-dimensional structure. We report a novel pathogenic variant of PPP2R5D associated with PPP2R5D-related neurodevelopmental disorder from a Chinese family. Our findings expanded the phenotypic and mutational spectrum of PPP2R5D-related neurodevelopmental disorder.


2019 ◽  
Author(s):  
Kai Shimagaki ◽  
Martin Weigt

Statistical models for families of evolutionary related proteins have recently gained interest: in particular pairwise Potts models, as those inferred by the Direct-Coupling Analysis, have been able to extract information about the three-dimensional structure of folded proteins, and about the effect of amino-acid substitutions in proteins. These models are typically requested to reproduce the one- and two-point statistics of the amino-acid usage in a protein family, i.e. to capture the so-called residue conservation and covariation statistics of proteins of common evolutionary origin. Pairwise Potts models are the maximum-entropy models achieving this. While being successful, these models depend on huge numbers of ad hoc introduced parameters, which have to be estimated from finite amount of data and whose biophysical interpretation remains unclear. Here we propose an approach to parameter reduction, which is based on selecting collective sequence motifs. It naturally leads to the formulation of statistical sequence models in terms of Hopfield-Potts models. These models can be accurately inferred using a mapping to restricted Boltzmann machines and persistent contrastive divergence. We show that, when applied to protein data, even 20-40 patterns are sufficient to obtain statistically close-to-generative models. The Hopfield patterns form interpretable sequence motifs and may be used to clusterize amino-acid sequences into functional sub-families. However, the distributed collective nature of these motifs intrinsically limits the ability of Hopfield-Potts models in predicting contact maps, showing the necessity of developing models going beyond the Hopfield-Potts models discussed here.


1987 ◽  
Vol 42 (6) ◽  
pp. 742-750 ◽  
Author(s):  
Achim Trebst

The folding through the membrane of the plastoquinone and herbicide binding protein subunits of photosystem II and the topology of the binding niche for plastoquinone and herbicides is described. The model is based on the homology in amino acid sequence and folding prediction from the hydropathy analysis of the D-1 and D-2 subunits of photosystem II to the reaction center polypeptides L and M of the bacterial reaction center. It incorporates the amino acid changes in the D-1 polypeptide in herbicide tolerant plants and those indicated by chemical tagging to be involved in Qв binding. It proposes homologous amino acids in the D-1/D-2 polypeptides to those indicated by the X-ray structure of the bacterial reaction center to be involved in Fe-, quinone- and reaction center chlorophyll-binding. The different chemical compounds known to interfere with Qв function are grouped into two families depending on their orientation in the Qв binding niche.


Sign in / Sign up

Export Citation Format

Share Document