Hybrid localized graph kernel for machine learning energy‐related properties of molecules and solids

Automated recognition of functional compound-protein relationships in literature

10.1101/718205 ◽

2019 ◽

Author(s):

Kersten Döring ◽

Ammar Qaseem ◽

Kiran K Telukunta ◽

Michael Becer ◽

Philippe Thomas ◽

...

Keyword(s):

Machine Learning ◽

Text Mining ◽

Information Extraction ◽

Small Molecules ◽

Benchmark Dataset ◽

Learning Methods ◽

Graph Kernel ◽

Functional Relationships ◽

Machine Learning Methods ◽

Auc Value

AbstractMotivationMuch effort has been invested in the identification of protein-protein interactions using text mining and machine learning methods. The extraction of functional relationships between chemical compounds and proteins from literature has received much less attention, and no ready-to-use open-source software is so far available for this task.MethodWe created a new benchmark dataset of 2,753 sentences from abstracts containing annotations of proteins, small molecules, and their relationships. Two kernel methods were applied to classify these relationships as functional or non-functional, named shallow linguistic and all-paths graph kernel. Furthermore, the benefit of interaction verbs in sentences was evaluated.ResultsThe cross-validation of the all-paths graph kernel (AUC value: 84.2%, F1 score: 81.8%) shows slightly better results than the shallow linguistic kernel (AUC value: 81.6%, F1 score: 79.7%) on our benchmark dataset. Both models achieve state-of-the-art performance in the research area of relation extraction. Furthermore, the combination of shallow linguistic and all-paths graph kernel could further increase the overall performance. We used each of the two kernels to identify functional relationships in all PubMed abstracts (28 million) and provide the results, including recorded processing time.AvailabilityThe software for the tested kernels, the benchmark, the processed 28 million PubMed abstracts, all evaluation scripts, as well as the scripts for processing the complete PubMed database are freely available at https://github.com/KerstenDoering/CPI-Pipeline.Author summaryText mining aims at organizing large sets of unstructured text data to provide efficient information extraction. Particularly in the area of drug discovery, the knowledge about small molecules and their interactions with proteins is of crucial importance to understand the drug effects on cells, tissues, and organisms. This data is normally hidden in written articles, which are published in journals with a focus on life sciences. In this publication, we show how text mining methods can be used to extract data about functional interactions between small molecules and proteins from texts. We created a new dataset with annotated sentences of scientific abstracts for the purpose of training two diverse machine learning methods (kernels), and successfully classified compound-protein pairs as functional and non-functional relations, i.e. no interactions. Our newly developed benchmark dataset and the pipeline for information extraction are freely available for download. Furthermore, we show that the software can be easily up-scaled to process large datasets by applying the approach to 28 million abstracts.

Download Full-text

Recommendation as link prediction in bipartite graphs: A graph kernel-based machine learning approach

Decision Support Systems ◽

10.1016/j.dss.2012.09.019 ◽

2013 ◽

Vol 54 (2) ◽

pp. 880-890 ◽

Cited By ~ 111

Author(s):

Xin Li ◽

Hsinchun Chen

Keyword(s):

Machine Learning ◽

Link Prediction ◽

Bipartite Graphs ◽

Learning Approach ◽

Graph Kernel ◽

Machine Learning Approach

Download Full-text

A novel graph kernel on chemical compound classification

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720018500269 ◽

2018 ◽

Vol 16 (06) ◽

pp. 1850026

Author(s):

Qiangrong Jiang ◽

Jiajia Ma

Keyword(s):

Machine Learning ◽

Graph Theory ◽

Kernel Methods ◽

Good Choice ◽

Graph Kernel ◽

Graph Kernels ◽

Machine Learning Methods ◽

Compound Classification ◽

Local Block

Considering the classification of compounds as a nonlinear problem, the use of kernel methods is a good choice. Graph kernels provide a nice framework combining machine learning methods with graph theory, whereas the essence of graph kernels is to compare the substructures of two graphs, how to extract the substructures is a question. In this paper, we propose a novel graph kernel based on matrix named the local block kernel, which can compare the similarity of partial substructures that contain any number of vertexes. The paper finally tests the efficacy of this novel graph kernel in comparison with a number of published mainstream methods and results with two datasets: NCI1 and NCI109 for the convenience of comparison.

Download Full-text

Mind wandering as data augmentation: How mental travel supports abstraction

Behavioral and Brain Sciences ◽

10.1017/s0140525x1900311x ◽

2020 ◽

Vol 43 ◽

Author(s):

Myrthe Faber

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Mental Content ◽

Mind Wandering ◽

Theoretical Framework ◽

Important Addition

Abstract Gilead et al. state that abstraction supports mental travel, and that mental travel critically relies on abstraction. I propose an important addition to this theoretical framework, namely that mental travel might also support abstraction. Specifically, I argue that spontaneous mental travel (mind wandering), much like data augmentation in machine learning, provides variability in mental content and context necessary for abstraction.

Download Full-text