scholarly journals Vec2SPARQL: integrating SPARQL queries and knowledge graph embeddings

2018 ◽  
Author(s):  
Maxat Kulmanov ◽  
Senay Kafkas ◽  
Andreas Karwath ◽  
Alexander Malic ◽  
Georgios V Gkoutos ◽  
...  

AbstractRecent developments in machine learning have lead to a rise of large number of methods for extracting features from structured data. The features are represented as a vectors and may encode for some semantic aspects of data. They can be used in a machine learning models for different tasks or to compute similarities between the entities of the data. SPARQL is a query language for structured data originally developed for querying Resource Description Framework (RDF) data. It has been in use for over a decade as a standardized NoSQL query language. Many different tools have been developed to enable data sharing with SPARQL. For example, SPARQL endpoints make your data interoperable and available to the world. SPARQL queries can be executed across multiple endpoints. We have developed a Vec2SPARQL, which is a general framework for integrating structured data and their vector space representations. Vec2SPARQL allows jointly querying vector functions such as computing similarities (cosine, correlations) or classifications with machine learning models within a single SPARQL query. We demonstrate applications of our approach for biomedical and clinical use cases. Our source code is freely available at https://github.com/bio-ontology-research-group/vec2sparql and we make a Vec2SPARQL endpoint available at http://sparql.bio2vec.net/.

Author(s):  
Maxat Kulmanov ◽  
Fatima Zohra Smaili ◽  
Xin Gao ◽  
Robert Hoehndorf

Ontologies have long been employed in the life sciences to formally represent and reason over domain knowledge, and they are employed in almost every major biological database. Recently, ontologies are increasingly being used to provide background knowledge in similarity-based analysis and machine learning models. The methods employed to combine ontologies and machine learning are still novel and actively being developed. We provide an overview over the methods that use ontologies to compute similarity and incorporate them in machine learning methods; in particular, we outline how semantic similarity measures and ontology embeddings can exploit the background knowledge in biomedical ontologies, and how ontologies can provide constraints that improve machine learning models. The methods and experiments we describe are available as a set of executable notebooks, and we also provide a set of slides and additional resources at https://github.com/bio-ontology-research-group/machine-learning-with-ontologies.Key pointsOntologies provide background knowledge that can be exploited in machine learning models.Ontology embeddings are structure-preserving maps from ontologies into vector spaces and provide an important method for utilizing ontologies in machine learning. Embeddings can preserve different structures in ontologies, including their graph structures, syntactic regularities, or their model-theoretic semantics.Axioms in ontologies, in particular those involving negation, can be used as constraints in optimization and machine learning to reduce the search space.


Energies ◽  
2019 ◽  
Vol 12 (24) ◽  
pp. 4745 ◽  
Author(s):  
Dana-Mihaela Petroșanu ◽  
George Căruțașu ◽  
Nicoleta Luminița Căruțașu ◽  
Alexandru Pîrjan

Lately, many scientists have focused their research on subjects like smart buildings, sensor devices, virtual sensing, buildings management, Internet of Things (IoT), artificial intelligence in the smart buildings sector, improving life quality within smart homes, assessing the occupancy status information, detecting human behavior with a view to assisted living, maintaining environmental health, and preserving natural resources. The main purpose of our review consists of surveying the current state of the art regarding the recent developments in integrating supervised and unsupervised machine learning models with sensor devices in the smart building sector with a view to attaining enhanced sensing, energy efficiency and optimal building management. We have devised the research methodology with a view to identifying, filtering, categorizing, and analyzing the most important and relevant scientific articles regarding the targeted topic. To this end, we have used reliable sources of scientific information, namely the Elsevier Scopus and the Clarivate Analytics Web of Science international databases, in order to assess the interest regarding the above-mentioned topic within the scientific literature. After processing the obtained papers, we finally obtained, on the basis of our devised methodology, a reliable, eloquent and representative pool of 146 papers scientific works that would be useful for developing our survey. Our approach provides a useful up-to-date overview for researchers from different fields, which can be helpful when submitting project proposals or when studying complex topics such those reviewed in this paper. Meanwhile, the current study offers scientists the possibility of identifying future research directions that have not yet been addressed in the scientific literature or improving the existing approaches based on the body of knowledge. Moreover, the conducted review creates the premises for identifying in the scientific literature the main purposes for integrating Machine Learning techniques with sensing devices in smart environments, as well as purposes that have not been investigated yet.


Author(s):  
Wenjuan Wang ◽  
Martin Kiik ◽  
Niels Peek ◽  
Vasa Curcin ◽  
Iain J. Marshall ◽  
...  

2019 ◽  
Author(s):  
KC Govinda ◽  
Md Mahmudulla Hassan ◽  
Suman Sirimulla

AbstractKinases are one of the most important classes of drug targets for therapeutic use. Algorithms that can accurately predict the drug-kinase inhibitor constant (pKi) of kinases can considerably accelerate the drug discovery process. In this study, we have developed computational models, leveraging machine learning techniques, to predict ligand-kinase (pKi) values. Kinase-ligand inhibitor constant (Ki) data was retrieved from Drug Target Commons (DTC) and Metz databases. Machine learning models were developed based on structural and physicochemical features of the protein and, topological pharmacophore atomic triplets fingerprints of the ligands. Three machine learning models [random forest (RFR), extreme gradient boosting (XGBoost) and artificial neural network (ANN)] were tested for model development. The performance of our models were evaluated using several metrics with 95% confidence interval. RFR model was finally selected based on the evaluation metrics on test datasets and used for web implementation. The best and selected model achieved a Pearson correlation coefficient (R) of 0.887 (0.881, 0.893), root-mean-square error (RMSE) of 0.475 (0.465, 0.486), Concordance index (Con. Index) of 0.854 (0.851, 0.858), and an area under the curve of receiver operating characteristic curve (AUC-ROC) of 0.957 (0.954, 0.960) during the internal 5-fold cross validation.AvailabilityGitHub: https://github.com/sirimullalab/KinasepKipred, Docker: sirimullalab/kinasepkipredImplementationhttps://drugdiscovery.utep.edu/pki/Graphical TOC Entry


PLoS ONE ◽  
2020 ◽  
Vol 15 (6) ◽  
pp. e0234722
Author(s):  
Wenjuan Wang ◽  
Martin Kiik ◽  
Niels Peek ◽  
Vasa Curcin ◽  
Iain J. Marshall ◽  
...  

Risks ◽  
2019 ◽  
Vol 7 (3) ◽  
pp. 82 ◽  
Author(s):  
Taylor

The purpose of this paper is to survey recent developments in granular models and machine learning models for loss reserving, and to compare the two families with a view to assessment of their potential for future development. This is best understood against the context of the evolution of these models from their predecessors, and the early sections recount relevant archaeological vignettes from the history of loss reserving. However, the larger part of the paper is concerned with the granular models and machine learning models. Their relative merits are discussed, as are the factors governing the choice between them and the older, more primitive models. Concluding sections briefly consider the possible further development of these models in the future.


2021 ◽  
Author(s):  
Zachary Arnold ◽  
Joanne Boisson ◽  
Lorenzo Bongiovanni ◽  
Daniel Chou ◽  
Carrie Peelman ◽  
...  

In this proof-of-concept project, CSET and Amplyfi Ltd. used machine learning models and Chinese-language web data to identify Chinese companies active in artificial intelligence. Most of these companies were not labeled or described as AI-related in two high-quality commercial datasets. The authors' findings show that using structured data alone—even from the best providers—will yield an incomplete picture of the Chinese AI landscape.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


Sign in / Sign up

Export Citation Format

Share Document