Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily

Serina L Robinson; Megan D Smith; Jack E Richman; Kelly G Aukema; Lawrence P Wackett

doi:10.1093/synbio/ysaa004

Machine learning-based prediction of activity and substrate specificity for OleA enzymes in the thiolase superfamily

Synthetic Biology ◽

10.1093/synbio/ysaa004 ◽

2020 ◽

Vol 5 (1) ◽

Author(s):

Serina L Robinson ◽

Megan D Smith ◽

Jack E Richman ◽

Kelly G Aukema ◽

Lawrence P Wackett

Keyword(s):

Machine Learning ◽

Enzyme Activity ◽

Substrate Specificity ◽

Xanthomonas Campestris ◽

Membrane Lipids ◽

Characteristic Curve ◽

Structural Features ◽

Activity Levels ◽

Enzyme Substrate ◽

Carbon Carbon Bond

Abstract Enzymes in the thiolase superfamily catalyze carbon–carbon bond formation for the biosynthesis of polyhydroxyalkanoate storage molecules, membrane lipids and bioactive secondary metabolites. Natural and engineered thiolases have applications in synthetic biology for the production of high-value compounds, including personal care products and therapeutics. A fundamental understanding of thiolase substrate specificity is lacking, particularly within the OleA protein family. The ability to predict substrates from sequence would advance (meta)genome mining efforts to identify active thiolases for the production of desired metabolites. To gain a deeper understanding of substrate scope within the OleA family, we measured the activity of 73 diverse bacterial thiolases with a library of 15 p-nitrophenyl ester substrates to build a training set of 1095 unique enzyme–substrate pairs. We then used machine learning to predict thiolase substrate specificity from physicochemical and structural features. The area under the receiver operating characteristic curve was 0.89 for random forest classification of enzyme activity, and our regression model had a test set root mean square error of 0.22 (R2 = 0.75) to quantitatively predict enzyme activity levels. Substrate aromaticity, oxygen content and molecular connectivity were the strongest predictors of enzyme–substrate pairing. Key amino acid residues A173, I284, V287, T292 and I316 in the Xanthomonas campestris OleA crystal structure lining the substrate binding pockets were important for thiolase substrate specificity and are attractive targets for future protein engineering studies. The predictive framework described here is generalizable and demonstrates how machine learning can be used to quantitatively understand and predict enzyme substrate specificity.

Download Full-text

Recurrent Clostridioides difficile infection can be predicted using inflammatory mediator and toxin activity levels

Infection Control and Hospital Epidemiology ◽

10.1017/ice.2020.568 ◽

2020 ◽

Vol 41 (S1) ◽

pp. s77-s78

Author(s):

Jonathan Motyka ◽

Aline Penkevich ◽

Vincent Young ◽

Krishna Rao

Keyword(s):

Machine Learning ◽

Inflammatory Mediators ◽

Inflammatory Mediator ◽

Characteristic Curve ◽

Receiver Operator Characteristic Curve ◽

Sensitivity Analyses ◽

Activity Levels ◽

Clinical Criteria ◽

Clostridioides Difficile ◽

Potential Biomarker

Background:Clostridioides difficile infection (CDI) frequently recurs after initial treatment. Predicting recurrent CDI (rCDI) early in the disease course can assist clinicians in their decision making and improve outcomes. However, predictions based on clinical criteria alone are not accurate and/or do not validate other results. Here, we tested the hypothesis that circulating and stool-derived inflammatory mediators predict rCDI. Methods: Consecutive subjects with available specimens at diagnosis were included if they tested positive for toxigenic C. difficile (+enzyme immunoassay [EIA] for glutamate dehydrogenase and toxins A/B, with reflex to PCR for the tcdB gene for discordants). Stool was thawed on ice, diluted 1:1 in PBS with protease inhibitor, centrifuged, and used immediately. A 17-plex panel of inflammatory mediators was run on a Luminex 200 machine using a custom antibody-linked bead array. Prior to analysis, all measurements were normalized and log-transformed. Stool toxin activity levels were quantified using a custom cell-culture assay. Recurrence was defined as a second episode of CDI within 100 days. Ordination characterized variation in the panel between outcomes, tested with a permutational, multivariate ANOVA. Machine learning via elastic net regression with 100 iterations of 5-fold cross validation selected the optimal model and the area under the receiver operator characteristic curve (AuROC) was computed. Sensitivity analyses excluding those that died and/or lived >100 km away were performed. Results: We included 186 subjects, with 95 women (51.1%) and average age of 55.9 years (±20). More patients were diagnosed by PCR than toxin EIA (170 vs 55, respectively). Death, rCDI, and no rCDI occurred in 32 (17.2%), 36 (19.4%), and 118 (63.4%) subjects, respectively. Ordination revealed that the serum panel was associated with rCDI (P = .007) but the stool panel was not. Serum procalcitonin, IL-8, IL-6, CCL5, and EGF were associated with recurrence. The machine-learning models using the serum panel predicted rCDI with AuROCs between 0.74 and 0.8 (Fig. 1). No stool inflammatory mediators independently predicted rCDI. However, stool IL-8 interacted with toxin activity to predict rCDI (Fig. 2). These results did not change significantly upon sensitivity analysis. Conclusions: A panel of serum inflammatory mediators predicted rCDI with up to 80% accuracy, but the stool panel alone was less successful. Incorporating toxin activity levels alongside inflammatory mediator measurements is a novel, promising approach to studying stool-derived biomarkers of rCDI. This approach revealed that stool IL-8 is a potential biomarker for rCDI. These results need to be confirmed both with a larger dataset and after adjustment for clinical covariates.Funding: NoneDisclosure: Vincent Young is a consultant for Bio-K+ International, Pantheryx, and Vedanta Biosciences.

Download Full-text

4. Structure for catalysis

Enzymes: A Very Short Introduction ◽

10.1093/actrade/9780198824985.003.0004 ◽

2020 ◽

pp. 49-61

Author(s):

Paul Engel

Keyword(s):

Substrate Specificity ◽

Enzyme Catalysis ◽

Structural Features ◽

Enzyme Mechanism ◽

Substrate Complex ◽

Enzyme Substrate ◽

The Right ◽

Crucial Ingredient

‘Structure for catalysis’ details the various patterns of enzyme mechanism and the various structural features helping to achieve catalysis. One of the striking features of enzyme catalysis is substrate specificity. In the lock-and-key hypothesis, the enzyme is viewed as a precisely shaped lock and only the right key, the substrate, can fit and turn it. The lock-and-key combination is the enzyme–substrate complex. A crucial ingredient of the enzyme’s equipment for achieving outstanding catalysis is the ‘catalytic groups’.

Download Full-text

A series of N-carbamoyloxyurea resistant cell lines with alterations in ribonucleotide reductase: lack of coordination in pyrimidine and purine reductase activity

Canadian Journal of Biochemistry and Cell Biology ◽

10.1139/o83-018 ◽

1983 ◽

Vol 61 (2-3) ◽

pp. 120-129 ◽

Cited By ~ 8

Author(s):

Robert G. Hards ◽

Jim A. Wright

Keyword(s):

Enzyme Activity ◽

Dna Synthesis ◽

Cell Lines ◽

Ribonucleotide Reductase ◽

Reductase Activity ◽

Resistant Cell ◽

Activity Levels ◽

Drug Resistant ◽

Enzyme Substrate ◽

Resistant Lines

N-Carbamoyloxyurea is cytotoxic for cells in culture and, like hydroxyurea and guanazole, the drug is an effective inhibitor of mammalian ribonucleotide reductase and thus DNA synthesis. In addition to ribonucleotide reductase, N-carbamoyloxyurea has a second site of action which also appears to be in the pathway of DNA synthesis. A series of drug-resistant cell lines, which contain alterations in ribonucleotide reduction, have been sequentially selected in the presence of increasing concentrations of N-carbamoyloxyurea. CDP and ADP reductase activities in these drug-resistant lines have been investigated and two types of alterations have been identified: elevated levels of enzyme activity with wild-type sensitivity to drug and altered levels of reductase with reduced drug sensitivity, probably owing to structural modification of the enzyme. Furthermore, N-carbamoyloxyurea resistant lines contain another alteration as well, presumably at a second site of drug action. They are also cross-resistant to hydroxyurea and guanazole, and studies on enzyme activity levels support our previous findings with cells selected for resistance to hydroxyurea, which showed changes in CDP reductase activity are not always coordinated with changes in ADP reductase. Although several possibilities exist, these observations are most easily explained by the existence of independent enzyme substrate binding subunits which are regulated by different mechanisms. Moreover, increases in cellular resistance were accompanied by significant increases in CDP but not ADP reductase, suggesting that an ability to maintain an adequate level of CDP reductase activity is especially important to achieve resistance to DNA synthesis inhibitors like N-carbamoyloxyurea, hydroxyurea, and guanazole.

Download Full-text

A Novel Machine Learning Strategy for Prediction of Antihypertensive Peptides Derived from Food with High Efficiency

10.1101/2020.08.12.248955 ◽

2020 ◽

Author(s):

Liyang Wang ◽

Dantong Niu ◽

Xiaoya Wang ◽

Qun Shen ◽

Yong Xue

Keyword(s):

Machine Learning ◽

High Throughput ◽

High Efficiency ◽

Characteristic Curve ◽

Bovine Milk ◽

Structural Features ◽

Protein Docking ◽

Gradient Boosting ◽

Extreme Gradient Boosting ◽

Antihypertensive Peptides

AbstractStrategies to screen antihypertensive peptides with high throughput and rapid speed will be doubtlessly contributed to the treatment of hypertension. The food-derived antihypertensive peptides can reduce blood pressure without side effects. In present study, a novel model based on Extreme Gradient Boosting (XGBoost) algorithm was developed using the primary structural features of the food-derived peptides, and its performance in the prediction of antihypertensive peptides was compared with the dominating machine learning models. To further reflect the reliability of the method in real situation, the optimized XGBoost model was utilized to predict the antihypertensive degree of k-mer peptides cutting from 6 key proteins in bovine milk and the peptide-protein docking technology was introduced to verify the findings. The results showed that the XGBoost model achieved outstanding performance with the accuracy of 0.9841 and the area under the receiver operating characteristic curve of 0.9428, which were better than the other models. Using the XGBoost model, the prediction of antihypertensive peptides derived from milk protein was consistent with the peptide-protein docking results, and was more efficient. Our results indicate that using XGBoost algorithm as a novel auxiliary tool is feasible for screening antihypertensive peptide derived from food with high throughput and high efficiency.

Download Full-text

Biosynthetic controls that determine the branching and microheterogeneity of protein-bound oligosaccharides

Biochemistry and Cell Biology ◽

10.1139/o86-026 ◽

1986 ◽

Vol 64 (3) ◽

pp. 163-181 ◽

Cited By ~ 406

Author(s):

Harry Schachter

Keyword(s):

Enzyme Activity ◽

Substrate Specificity ◽

Substrate Availability ◽

Enzyme Substrate ◽

General Rules ◽

Branching Patterns ◽

Common Substrate ◽

Branch Specificity

Detailed studies on the enzyme machinery responsible for the biosynthesis of protein-bound oligosaccharides of the Asn-GlcNAc and Ser(Thr)-GalNAc linkage types have allowed the formulation of some general rules which explain, at least in part, the branching patterns and microheterogeneity of these structures. These rules are discussed under the following headings: (i) competition of two or more enzymes for a common substrate; (ii) controls at the level of enzyme substrate specificity (e.g., critical sugar residues which turn enzyme activity on or off, branch specificity, and the role of the polypeptide in the glycoprotein substrate); (iii) substrate availability.

Download Full-text

Machine Learning Prediction of SARS-CoV-2 Polymerase Chain Reaction Results with Routine Blood Tests

Laboratory Medicine ◽

10.1093/labmed/lmaa111 ◽

2020 ◽

Author(s):

Thomas Tschoellitsch ◽

Martin Dünser ◽

Carl Böck ◽

Karin Schwarzbauer ◽

Jens Meier

Keyword(s):

Machine Learning ◽

Polymerase Chain Reaction ◽

Characteristic Curve ◽

Cohort Analysis ◽

Rt Pcr ◽

Chain Reaction ◽

Blood Tests ◽

Routine Blood ◽

Machine Learning Model ◽

Polymerase Chain

Abstract Objective The diagnosis of COVID-19 is based on the detection of SARS-CoV-2 in respiratory secretions, blood, or stool. Currently, reverse transcription polymerase chain reaction (RT-PCR) is the most commonly used method to test for SARS-CoV-2. Methods In this retrospective cohort analysis, we evaluated whether machine learning could exclude SARS-CoV-2 infection using routinely available laboratory values. A Random Forests algorithm with 1353 unique features was trained to predict the RT-PCR results. Results Out of 12,848 patients undergoing SARS-CoV-2 testing, routine blood tests were simultaneously performed in 1528 patients. The machine learning model could predict SARS-CoV-2 test results with an accuracy of 86% and an area under the receiver operating characteristic curve of 0.90. Conclusion Machine learning methods can reliably predict a negative SARS-CoV-2 RT-PCR test result using standard blood tests.

Download Full-text

Descriptors of Cytochrome Inhibitors and Useful Machine Learning Based Methods for the Design of Safer Drugs

Pharmaceuticals ◽

10.3390/ph14050472 ◽

2021 ◽

Vol 14 (5) ◽

pp. 472

Author(s):

Tyler C. Beck ◽

Kyle R. Beck ◽

Jordan Morningstar ◽

Menny M. Benjamin ◽

Russell A. Norris

Keyword(s):

United States ◽

Machine Learning ◽

Drug Interactions ◽

The United States ◽

Structural Features ◽

Physiochemical Properties ◽

Drug Dosing ◽

Therapeutic Outcomes ◽

Cyp Inhibition ◽

Cyp Inhibitors

Roughly 2.8% of annual hospitalizations are a result of adverse drug interactions in the United States, representing more than 245,000 hospitalizations. Drug–drug interactions commonly arise from major cytochrome P450 (CYP) inhibition. Various approaches are routinely employed in order to reduce the incidence of adverse interactions, such as altering drug dosing schemes and/or minimizing the number of drugs prescribed; however, often, a reduction in the number of medications cannot be achieved without impacting therapeutic outcomes. Nearly 80% of drugs fail in development due to pharmacokinetic issues, outlining the importance of examining cytochrome interactions during preclinical drug design. In this review, we examined the physiochemical and structural properties of small molecule inhibitors of CYPs 3A4, 2D6, 2C19, 2C9, and 1A2. Although CYP inhibitors tend to have distinct physiochemical properties and structural features, these descriptors alone are insufficient to predict major cytochrome inhibition probability and affinity. Machine learning based in silico approaches may be employed as a more robust and accurate way of predicting CYP inhibition. These various approaches are highlighted in the review.

Download Full-text

Machine learning approach for differentiating cytomegalovirus esophagitis from herpes simplex virus esophagitis

Scientific Reports ◽

10.1038/s41598-020-78556-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Jung Su Lee ◽

Jihye Yun ◽

Sungwon Ham ◽

Hyunjung Park ◽

Hyunsu Lee ◽

...

Keyword(s):

Machine Learning ◽

Differential Diagnosis ◽

Herpes Simplex Virus ◽

Herpes Simplex ◽

Predictive Value ◽

Characteristic Curve ◽

Previous History ◽

Clinical Factor ◽

Endoscopic Images ◽

Simplex Virus

AbstractThe endoscopic features between herpes simplex virus (HSV) and cytomegalovirus (CMV) esophagitis overlap significantly, and hence the differential diagnosis between HSV and CMV esophagitis is sometimes difficult. Therefore, we developed a machine-learning-based classifier to discriminate between CMV and HSV esophagitis. We analyzed 87 patients with HSV esophagitis and 63 patients with CMV esophagitis and developed a machine-learning-based artificial intelligence (AI) system using a total of 666 endoscopic images with HSV esophagitis and 416 endoscopic images with CMV esophagitis. In the five repeated five-fold cross-validations based on the hue–saturation–brightness color model, logistic regression with a least absolute shrinkage and selection operation showed the best performance (sensitivity, specificity, positive predictive value, negative predictive value, accuracy, and area under the receiver operating characteristic curve: 100%, 100%, 100%, 100%, 100%, and 1.0, respectively). Previous history of transplantation was included in classifiers as a clinical factor; the lower the performance of these classifiers, the greater the effect of including this clinical factor. Our machine-learning-based AI system for differential diagnosis between HSV and CMV esophagitis showed high accuracy, which could help clinicians with diagnoses.

Download Full-text

Deep Learning Classification of Canine Behavior Using a Single Collar-Mounted Accelerometer: Real-World Validation

Animals ◽

10.3390/ani11061549 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1549

Author(s):

Robert D. Chambers ◽

Nathanael C. Yoder ◽

Aletha B. Carson ◽

Christian Junge ◽

David E. Allen ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Real World ◽

Learning Algorithm ◽

Drinking Behavior ◽

True Positive Rate ◽

Training Dataset ◽

Activity Levels ◽

Accelerometer Data ◽

Activity Monitors

Collar-mounted canine activity monitors can use accelerometer data to estimate dog activity levels, step counts, and distance traveled. With recent advances in machine learning and embedded computing, much more nuanced and accurate behavior classification has become possible, giving these affordable consumer devices the potential to improve the efficiency and effectiveness of pet healthcare. Here, we describe a novel deep learning algorithm that classifies dog behavior at sub-second resolution using commercial pet activity monitors. We built machine learning training databases from more than 5000 videos of more than 2500 dogs and ran the algorithms in production on more than 11 million days of device data. We then surveyed project participants representing 10,550 dogs, which provided 163,110 event responses to validate real-world detection of eating and drinking behavior. The resultant algorithm displayed a sensitivity and specificity for detecting drinking behavior (0.949 and 0.999, respectively) and eating behavior (0.988, 0.983). We also demonstrated detection of licking (0.772, 0.990), petting (0.305, 0.991), rubbing (0.729, 0.996), scratching (0.870, 0.997), and sniffing (0.610, 0.968). We show that the devices’ position on the collar had no measurable impact on performance. In production, users reported a true positive rate of 95.3% for eating (among 1514 users), and of 94.9% for drinking (among 1491 users). The study demonstrates the accurate detection of important health-related canine behaviors using a collar-mounted accelerometer. We trained and validated our algorithms on a large and realistic training dataset, and we assessed and confirmed accuracy in production via user validation.

Download Full-text

Development of Machine Learning Models to Predict Probabilities and Types of Stroke at Prehospital Stage: the Japan Urgent Stroke Triage Score Using Machine Learning (JUST-ML)

Translational Stroke Research ◽

10.1007/s12975-021-00937-x ◽

2021 ◽

Author(s):

Kazutaka Uchida ◽

Junichi Kouno ◽

Shinichi Yoshimura ◽

Norito Kinjo ◽

Fumihiro Sakakibara ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forests ◽

Prediction Models ◽

Characteristic Curve ◽

Predictive Performance ◽

Vessel Occlusion ◽

Predictive Values ◽

Training Cohort ◽

Sensitivity Specificity

AbstractIn conjunction with recent advancements in machine learning (ML), such technologies have been applied in various fields owing to their high predictive performance. We tried to develop prehospital stroke scale with ML. We conducted multi-center retrospective and prospective cohort study. The training cohort had eight centers in Japan from June 2015 to March 2018, and the test cohort had 13 centers from April 2019 to March 2020. We use the three different ML algorithms (logistic regression, random forests, XGBoost) to develop models. Main outcomes were large vessel occlusion (LVO), intracranial hemorrhage (ICH), subarachnoid hemorrhage (SAH), and cerebral infarction (CI) other than LVO. The predictive abilities were validated in the test cohort with accuracy, positive predictive value, sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and F score. The training cohort included 3178 patients with 337 LVO, 487 ICH, 131 SAH, and 676 CI cases, and the test cohort included 3127 patients with 183 LVO, 372 ICH, 90 SAH, and 577 CI cases. The overall accuracies were 0.65, and the positive predictive values, sensitivities, specificities, AUCs, and F scores were stable in the test cohort. The classification abilities were also fair for all ML models. The AUCs for LVO of logistic regression, random forests, and XGBoost were 0.89, 0.89, and 0.88, respectively, in the test cohort, and these values were higher than the previously reported prediction models for LVO. The ML models developed to predict the probability and types of stroke at the prehospital stage had superior predictive abilities.

Download Full-text