scholarly journals Machine learning on drug-specific data to predict small molecule teratogenicity

2019 ◽  
Author(s):  
Anup P. Challa ◽  
Andrew L. Beam ◽  
Min Shen ◽  
Tyler Peryea ◽  
Robert R. Lavieri ◽  
...  

AbstractPregnant women are an especially vulnerable population, given the sensitivity of a developing fetus to chemical exposures. However, prescribing behavior for the gravid patient is guided on limited human data and conflicting cases of adverse outcomes due to the exclusion of pregnant populations from randomized, controlled trials. These factors increase risk for adverse drug outcomes and reduce quality of care for pregnant populations. Herein, we propose the application of artificial intelligence to systematically predict the teratogenicity of a prescriptible small molecule from information inherent to the drug. Using unsupervised and supervised machine learning, our model probes all small molecules with known structure and teratogenicity data published in research-amenable formats to identify patterns among structural, meta-structural, and in vitro bioactivity data for each drug and its teratogenicity score. With this workflow, we discovered three chemical functionalities that predispose a drug towards increased teratogenicity and two moieties with potentially protective effects. Our models predict three clinically-relevant classes of teratogenicity with AUC = 0.8 and nearly double the predictive accuracy of a blind control for the same task, suggesting successful modeling. We also present extensive barriers to translational research that restrict data-driven studies in pregnancy and therapeutically “orphan” pregnant populations. Collectively, this work represents a first-in-kind platform for the application of computing to study and predict teratogenicity.

2021 ◽  
Vol 10 (7) ◽  
pp. 436
Author(s):  
Amerah Alghanim ◽  
Musfira Jilani ◽  
Michela Bertolotto ◽  
Gavin McArdle

Volunteered Geographic Information (VGI) is often collected by non-expert users. This raises concerns about the quality and veracity of such data. There has been much effort to understand and quantify the quality of VGI. Extrinsic measures which compare VGI to authoritative data sources such as National Mapping Agencies are common but the cost and slow update frequency of such data hinder the task. On the other hand, intrinsic measures which compare the data to heuristics or models built from the VGI data are becoming increasingly popular. Supervised machine learning techniques are particularly suitable for intrinsic measures of quality where they can infer and predict the properties of spatial data. In this article we are interested in assessing the quality of semantic information, such as the road type, associated with data in OpenStreetMap (OSM). We have developed a machine learning approach which utilises new intrinsic input features collected from the VGI dataset. Specifically, using our proposed novel approach we obtained an average classification accuracy of 84.12%. This result outperforms existing techniques on the same semantic inference task. The trustworthiness of the data used for developing and training machine learning models is important. To address this issue we have also developed a new measure for this using direct and indirect characteristics of OSM data such as its edit history along with an assessment of the users who contributed the data. An evaluation of the impact of data determined to be trustworthy within the machine learning model shows that the trusted data collected with the new approach improves the prediction accuracy of our machine learning technique. Specifically, our results demonstrate that the classification accuracy of our developed model is 87.75% when applied to a trusted dataset and 57.98% when applied to an untrusted dataset. Consequently, such results can be used to assess the quality of OSM and suggest improvements to the data set.


2017 ◽  
Vol 48 (3) ◽  
pp. 608-641 ◽  
Author(s):  
Akos Rona-Tas ◽  
Antoine Cornuéjols ◽  
Sandrine Blanchemanche ◽  
Antonin Duroy ◽  
Christine Martin

Recently, both sociology of science and policy research have shown increased interest in scientific uncertainty. To contribute to these debates and create an empirical measure of scientific uncertainty, we inductively devised two systems of classification or ontologies to describe scientific uncertainty in a large corpus of food safety risk assessments with the help of machine learning (ML). We ask three questions: (1) Can we use ML to assist with coding complex documents such as food safety risk assessments on a difficult topic like scientific uncertainty? (2) Can we assess using ML the quality of the ontologies we devised? (3) And, finally, does the quality of our ontologies depend on social factors? We found that ML can do surprisingly well in its simplest form identifying complex meanings, and it does not benefit from adding certain types of complexity to the analysis. Our ML experiments show that in one ontology which is a simple typology, against expectations, semantic opposites attract each other and support the taxonomic structure of the other. And finally, we found some evidence that institutional factors do influence how well our taxonomy of uncertainty performs, but its ability to capture meaning does not vary greatly across the time, institutional context, and cultures we investigated.


2018 ◽  
Vol 210 ◽  
pp. 02016 ◽  
Author(s):  
Tomasz Rymarczyk ◽  
Grzegorz Kłosowski

The article presents four selected methods of supervised machine learning, which can be successfully used in the tomography of flood embankments, walls, tanks, reactors and pipes. A comparison of the following methods was made: Artificial Neural Networks (ANN), Supported Vector Machine (SVM), K-Nearest Neighbour (KNN) and Multivariate Adaptive Regression Splines (MAR Splines). All analysed methods concerned regression problems. Thanks to performed analysis the differences expressed quantitatively were visualized with the use of indicators such as regression, error of mean square deviation, etc. Moreover, an innovative method of denoising tomographic output images with the use of convolutional auto-encoders was presented. Thanks to the use of a convolutional structure composed of two auto-encoders, a significant improvement in the quality of the output image from the ECT tomography was achieved.


2022 ◽  
Vol 12 (1) ◽  
pp. 514
Author(s):  
Raheel Nawaz ◽  
Quanbin Sun ◽  
Matthew Shardlow ◽  
Georgios Kontonatsios ◽  
Naif R. Aljohani ◽  
...  

Students’ evaluation of teaching, for instance, through feedback surveys, constitutes an integral mechanism for quality assurance and enhancement of teaching and learning in higher education. These surveys usually comprise both the Likert scale and free-text responses. Since the discrete Likert scale responses are easy to analyze, they feature more prominently in survey analyses. However, the free-text responses often contain richer, detailed, and nuanced information with actionable insights. Mining these insights is more challenging, as it requires a higher degree of processing by human experts, making the process time-consuming and resource intensive. Consequently, the free-text analyses are often restricted in scale, scope, and impact. To address these issues, we propose a novel automated analysis framework for extracting actionable information from free-text responses to open-ended questions in student feedback questionnaires. By leveraging state-of-the-art supervised machine learning techniques and unsupervised clustering methods, we implemented our framework as a case study to analyze a large-scale dataset of 4400 open-ended responses to the National Student Survey (NSS) at a UK university. These analyses then led to the identification, design, implementation, and evaluation of a series of teaching and learning interventions over a two-year period. The highly encouraging results demonstrate our approach’s validity and broad (national and international) application potential—covering tertiary education, commercial training, and apprenticeship programs, etc., where textual feedback is collected to enhance the quality of teaching and learning.


Author(s):  
Ryan J. Farr ◽  
Nathan Godde ◽  
Christopher Cowled ◽  
Vinod Sundaramoorthy ◽  
Diane Green ◽  
...  

Despite being vaccine preventable, rabies (lyssavirus) still has a significant impact on global mortality, disproportionally affecting children under 15 years of age. This neurotropic virus is deft at avoiding the immune system while travelling through neurons to the brain. Until recently, research efforts into the role of non-coding RNAs in rabies pathogenicity and detection have been hampered by a lack of human in vitro neuronal models. Here, we utilized our previously described human stem cell-derived neural model to investigate the effect of lyssavirus infection on microRNA (miRNA) expression in human neural cells and their secreted exosomes. Conventional differential expression analysis identified 25 cellular and 16 exosomal miRNAs that were significantly altered (FDR adjusted P-value <0.05) in response to different lyssavirus strains. Supervised machine learning algorithms determined 6 cellular miRNAs (miR-99b-5p, miR-346, miR-5701, miR-138-2-3p, miR-651-5p, and miR-7977) were indicative of lyssavirus infection (100% accuracy), with the first four miRNAs having previously established roles in neuronal function, or panic and impulsivity-related behaviors. Another 4-miRNA signatures in exosomes (miR-25-3p, miR-26b-5p, miR-218-5p, miR-598-3p) can independently predict lyssavirus infected cells with >99% accuracy. Identification of these robust lyssavirus miRNA signatures offers further insight into neural lineage responses to infection and provides a foundation for utilizing exosome miRNAs in the development of next-generation molecular diagnostics for rabies.


Mathematics ◽  
2020 ◽  
Vol 8 (5) ◽  
pp. 662 ◽  
Author(s):  
Husein Perez ◽  
Joseph H. M. Tah

In the field of supervised machine learning, the quality of a classifier model is directly correlated with the quality of the data that is used to train the model. The presence of unwanted outliers in the data could significantly reduce the accuracy of a model or, even worse, result in a biased model leading to an inaccurate classification. Identifying the presence of outliers and eliminating them is, therefore, crucial for building good quality training datasets. Pre-processing procedures for dealing with missing and outlier data, commonly known as feature engineering, are standard practice in machine learning problems. They help to make better assumptions about the data and also prepare datasets in a way that best expose the underlying problem to the machine learning algorithms. In this work, we propose a multistage method for detecting and removing outliers in high-dimensional data. Our proposed method is based on utilising a technique called t-distributed stochastic neighbour embedding (t-SNE) to reduce high-dimensional map of features into a lower, two-dimensional, probability density distribution and then use a simple descriptive statistical method called interquartile range (IQR) to identifying any outlier values from the density distribution of the features. t-SNE is a machine learning algorithm and a nonlinear dimensionality reduction technique well-suited for embedding high-dimensional data for visualisation in a low-dimensional space of two or three dimensions. We applied this method on a dataset containing images for training a convolutional neural network model (ConvNet) for an image classification problem. The dataset contains four different classes of images: three classes contain defects in construction (mould, stain, and paint deterioration) and a no-defect class (normal). We used the transfer learning technique to modify a pre-trained VGG-16 model. We used this model as a feature extractor and as a benchmark to evaluate our method. We have shown that, when using this method, we can identify and remove the outlier images in the dataset. After removing the outlier images from the dataset and re-training the VGG-16 model, the results have also shown that the accuracy of the classification has significantly improved and the number of misclassified cases has also dropped. While many feature engineering techniques for handling missing and outlier data are common in predictive machine learning problems involving numerical or categorical data, there is little work on developing techniques for handling outliers in high-dimensional data which can be used to improve the quality of machine learning problems involving images such as ConvNet models for image classification and object detection problems.


2019 ◽  
Vol 6 (Supplement_2) ◽  
pp. S744-S744
Author(s):  
Carolin Jakob ◽  
Annika Classen ◽  
Melanie Stecher ◽  
Sandra Fuhrmann ◽  
Bernd Franke ◽  
...  

Abstract Background Clinical management of prolonged febrile neutropenia despite broad-spectrum empirical antibacterial treatment is a clinical challenge, as standard empirical treatment has failed and a broad spectrum of differential diagnoses has to be considered. Growing prevalence of multi-resistant bacteria and fungi has made a balanced choice of effective anti-infective treatment more difficult. A reliable prediction of complications could indicate options for treatment optimization. Methods We implemented a supervised machine learning approach to predict death or admission to intensive care unit within 28 days in cancer patients with prolonged febrile neutropenia (neutrophils < 500/mm3 and body temperature ≥ 38°C longer than 3 days). We analyzed highly granular retrospective medical data of the Cologne Cohort of Neutropenic Patients (CoCoNut) between 2008 and 2014. Random forest and 10-fold cross-validation were used for classification. The neutropenic episodes from 2014 were used for evaluation of prediction. Results In total, 927 episodes of prolonged febrile neutropenia (median age 52 years, interquartile range 42–62; 562/927 [61%] male; 390/927 [42%] acute myeloid leukemia; 297/927 [32%] lymphoma) with 211/927 (23%) adverse outcomes were processed. We computed 226 features including patient characteristics, medication, clinical signs, as well as laboratory results describing changes of state and interactions of medical parameters. Feature selection revealed 65 features with an area under the receiver operating characteristic curve (AUC) of 0.75. In the validation data set the optimized model had a sensitivity/specificity of 36% and 99% (AUC: 0.68; misclassification error: 0.12) and positive/negative predictive values of 89% and 88%, respectively. The most important features were albumin, age, and procalcitonin. Conclusion Structured granular medical data and machine learning approaches are an innovative tool that can be used in a retrospective setting for prediction of adverse outcomes in patients with prolonged febrile neutropenia. This study is the first important step toward clinical decision support based on predictive models in high-risk cancer patients. Disclosures All authors: No reported disclosures.


2018 ◽  
Vol 34 (1) ◽  
pp. 6-13 ◽  
Author(s):  
Sebastian Sauer ◽  
Ricardo Buettner ◽  
Thomas Heidenreich ◽  
Jana Lemke ◽  
Christoph Berg ◽  
...  

Abstract. Mindfulness refers to a stance of nonjudgmental awareness of present-moment experiences. A growing body of research suggests that mindfulness may increase cognitive resources, thereby buffering stress. However, existing models have not achieved a consensus on how mindfulness should be operationalized. As the sound measurement of mindfulness is the foundation needed before substantial hypotheses can be supported, we propose a novel way of gauging the psychometric quality of a mindfulness measurement instrument (the Freiburg Mindfulness Inventory; FMI). Specifically, we employed 10 predictive algorithms to scrutinize the measurement quality of the FMI. Our criterion of measurement quality was the degree to which an algorithm separated mindfulness practitioner from nonpractitioners in a sample of N = 276. A high predictive accuracy of class membership can be taken as an indicator of the psychometric quality of the instrument. In sum, two findings are of interest. First, over and above some items of the FMI were able to reliably predict class membership. However, some items appeared to be uninformative. Second, from an applied methodological point of view, it appears that machine learning algorithms can outperform traditional predictive methods such as logistic regression. This finding may generalize to other branches of research.


Sign in / Sign up

Export Citation Format

Share Document