scholarly journals FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases

Author(s):  
M Delmas ◽  
O Filangi ◽  
N Paulhe ◽  
F Vinson ◽  
C Duperier ◽  
...  

Abstract Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM knowledge graph, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.

2021 ◽  
Author(s):  
M. Delmas ◽  
O. Filangi ◽  
N. Paulhe ◽  
F. Vinson ◽  
C. Duperier ◽  
...  

AbstractMetabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. Overcoming this challenge is critical when aiming to associate metabolic signatures with potential pathological outcomes. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. An important number of scientific articles discuss relations between chemical compounds and biomedical concepts in various contexts, from biomarkers to therapeutic uses. The extraction of these statements and their interconnection in a graph structure can thus allow us to identify and explore relations strongly supported in the scientific literature.The use of a Semantic Web framework on biological data allows us to apply ontological based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Beyond this result, FORUM can also provide insights into complex biological questions and the extracted information could then be used for further developments.Containing more than 8 billion triples and providing more than 8 million relations, FORUM leverages the increasing availability of linked datasets in life science and is built in agreement with FAIR principles. A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM knowledge graph, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem.


2010 ◽  
Vol 08 (03) ◽  
pp. 593-606 ◽  
Author(s):  
EKATERINA KOTELNIKOVA ◽  
ANTON YURYEV ◽  
ILYA MAZO ◽  
NIKOLAI DARASELIA

Heterogeneous high-throughput biological data become readily available for various diseases. The amount of data points generated by such experiments does not allow manual integration of the information to design the most optimal therapy for a disease. We describe a novel computational workflow for designing therapy using Ariadne Genomics Pathway Studio software. We use publically available microarray experiments for glioblastoma and automatically constructed ResNet and ChemEffect databases to exemplify how to find potentially effective chemicals for glioblastoma — the disease yet without effective treatment. Our first approach involved construction of signaling pathway affected in glioblastoma using scientific literature and data available in ResNet database. Compounds known to affect multiple proteins in this pathway were found in ChemEffect database. Another approach involved analysis of differential expression in glioblastoma patients using Sub-Network Enrichment Analysis (SNEA). SNEA identified angiogenesis-related protein Cyr61 as the major positive regulator upstream of genes differentially expressed in glioblastoma. Using our findings, we then identified breast cancer drug Fulvestrant as a major inhibitor of glioblastoma pathway as well as Cyr61. This suggested Fulvestrant as a potential treatment against glioblastoma. We further show how to increase efficacy of glioblastoma treatment by finding optimal combinations of Fulvestrant with other drugs.


2019 ◽  
Vol 47 (W1) ◽  
pp. W191-W198 ◽  
Author(s):  
Uku Raudvere ◽  
Liis Kolberg ◽  
Ivan Kuzmin ◽  
Tambet Arak ◽  
Priit Adler ◽  
...  

Abstract Biological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. The current update provides a better user experience due to a modern responsive web interface, standardised API and libraries. The results are delivered through an interactive and configurable web design. Results can be downloaded as publication ready visualisations or delimited text files. In the current update we have extended the support to 467 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is now capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. The 2019 update introduces an extensive technical rewrite making the services faster and more flexible. g:Profiler is freely available at https://biit.cs.ut.ee/gprofiler.


Author(s):  
Daniel Domingo-Fernández ◽  
Shounak Baksi ◽  
Bruce Schultz ◽  
Yojana Gadiya ◽  
Reagon Karki ◽  
...  

Abstract Summary The COVID-19 crisis has elicited a global response by the scientific community that has led to a burst of publications on the pathophysiology of the virus. However, without coordinated efforts to organize this knowledge, it can remain hidden away from individual research groups. By extracting and formalizing this knowledge in a structured and computable form, as in the form of a knowledge graph, researchers can readily reason and analyze this information on a much larger scale. Here, we present the COVID-19 Knowledge Graph, an expansive cause-and-effect network constructed from scientific literature on the new coronavirus that aims to provide a comprehensive view of its pathophysiology. To make this resource available to the research community and facilitate its exploration and analysis, we also implemented a web application and released the KG in multiple standard formats. Availability and implementation The COVID-19 Knowledge Graph is publicly available under CC-0 license at https://github.com/covid19kg and https://bikmi.covid19-knowledgespace.de. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Jie Wang ◽  
Min Wu ◽  
Xuhui Huang ◽  
Li Wang ◽  
Sophia Zhang ◽  
...  

Two genes are synthetic lethal if mutations in both genes result in impaired cell viability, while mutation of either gene does not affect the cell survival. The potential usage of synthetic lethality (SL) in anticancer therapeutics has attracted many researchers to identify synthetic lethal gene pairs. To include newly identified SLs and more related knowledge, we present a new version of the SynLethDB database to facilitate the discovery of clinically relevant SLs. We extended the first version of SynLethDB database significantly by including new SLs identified through CRISPR screening, a knowledge graph about human SLs, and new web interface, etc. Over 16,000 new SLs and 26 types of other relationships have been added, encompassing relationships among 14,100 genes, 53 cancers, and 1,898 drugs, etc. Moreover, a brand-new web interface has been developed to include modules such as SL query by disease or compound, SL partner gene set enrichment analysis and knowledge graph browsing through a dynamic graph viewer. The data can be downloaded directly from the website or through the RESTful APIs. The database is accessible online at http://synlethdb.sist.shanghaitech.edu.cn/v2.


2020 ◽  
Vol 36 (11) ◽  
pp. 3620-3622
Author(s):  
Alexander Auer ◽  
Maximilian T Strauss ◽  
Sebastian Strauss ◽  
Ralf Jungmann

Abstract Motivation Classification of images is an essential task in higher-level analysis of biological data. By bypassing the diffraction limit of light, super-resolution microscopy opened up a new way to look at molecular details using light microscopy, producing large amounts of data with exquisite spatial detail. Statistical exploration of data usually needs initial classification, which is up to now often performed manually. Results We introduce nanoTRON, an interactive open-source tool, which allows super-resolution data classification based on image recognition. It extends the software package Picasso with the first deep learning tool with a graphic user interface. Availability and implementation nanoTRON is written in Python and freely available under the MIT license as a part of the software collection Picasso on GitHub (http://www.github.com/jungmannlab/picasso). All raw data can be obtained from the authors upon reasonable request. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
pp. 65-75
Author(s):  
S. N. Smirnov

The author considers the problems of typification of society. Some concepts of typification of social stratification models in different countries formulated and justified in historical and legal, historical, sociological, and economic scientific literature are reviewed. The circumstances that make it difficult to formulate universal concepts designed for application in the complex of social Sciences are identified. These circumstances include insufficient consideration of legal factors, including the position of the legislator, the specifics of the corporate legal status, and the characteristics of the mechanism for changing individual legal status. The author offers a variant of classification of society types from the point of view of legal registration of their structure. The possibility of distinguishing types such as consolidated companies and segmented companies is justified.


Author(s):  
John Zobolas ◽  
Vasundra Touré ◽  
Martin Kuiper ◽  
Steven Vercruysse

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yi Chen ◽  
Fons. J. Verbeek ◽  
Katherine Wolstencroft

Abstract Background The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level concepts into data-level associations between hallmarks and genes (for high throughput analysis), vary widely between studies. The examination of different strategies to associate and map cancer hallmarks reveals significant differences, but also consensus. Results Here we present the results of a comparative analysis of cancer hallmark mapping strategies, based on Gene Ontology and biological pathway annotation, from different studies. By analysing the semantic similarity between annotations, and the resulting gene set overlap, we identify emerging consensus knowledge. In addition, we analyse the differences between hallmark and gene set associations using Weighted Gene Co-expression Network Analysis and enrichment analysis. Conclusions Reaching a community-wide consensus on how to identify cancer hallmark activity from research data would enable more systematic data integration and comparison between studies. These results highlight the current state of the consensus and offer a starting point for further convergence. In addition, we show how a lack of consensus can lead to large differences in the biological interpretation of downstream analyses and discuss the challenges of annotating changing and accumulating biological data, using intermediate knowledge resources that are also changing over time.


Author(s):  
Zhuohang Yu ◽  
Zengrui Wu ◽  
Weihua Li ◽  
Guixia Liu ◽  
Yun Tang

Abstract Summary MetaADEDB is an online database we developed to integrate comprehensive information on adverse drug events (ADEs). The first version of MetaADEDB was released in 2013 and has been widely used by researchers. However, it has not been updated for more than seven years. Here, we reported its second version by collecting more and newer data from the U.S. FDA Adverse Event Reporting System (FAERS) and Canada Vigilance Adverse Reaction Online Database, in addition to the original three sources. The new version consists of 744 709 drug–ADE associations between 8498 drugs and 13 193 ADEs, which has an over 40% increase in drug–ADE associations compared to the previous version. Meanwhile, we developed a new and user-friendly web interface for data search and analysis. We hope that MetaADEDB 2.0 could provide a useful tool for drug safety assessment and related studies in drug discovery and development. Availability and implementation The database is freely available at: http://lmmd.ecust.edu.cn/metaadedb/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document