FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases

Bioinformatics ◽

10.1093/bioinformatics/btab627 ◽

2021 ◽

Author(s):

M Delmas ◽

O Filangi ◽

N Paulhe ◽

F Vinson ◽

C Duperier ◽

...

Keyword(s):

Scientific Literature ◽

Semantic Representation ◽

Enrichment Analysis ◽

Biological Data ◽

Supplementary Information ◽

Knowledge Graph ◽

Web Interface ◽

Levels Of Abstraction ◽

Statistical Relevance

Abstract Motivation Metabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. Results The use of a Semantic Web framework on biological data allows us to apply ontological based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Availability A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM knowledge graph, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FORUM: Building a Knowledge Graph from public databases and scientific literature to extract associations between chemicals and diseases

10.1101/2021.02.12.430944 ◽

2021 ◽

Author(s):

M. Delmas ◽

O. Filangi ◽

N. Paulhe ◽

F. Vinson ◽

C. Duperier ◽

...

Keyword(s):

Life Science ◽

Scientific Literature ◽

Semantic Representation ◽

Enrichment Analysis ◽

Biological Data ◽

Knowledge Graph ◽

Levels Of Abstraction ◽

Link Type ◽

Therapeutic Uses ◽

Statistical Relevance

AbstractMetabolomics studies aim at reporting a metabolic signature (list of metabolites) related to a particular experimental condition. These signatures are instrumental in the identification of biomarkers or classification of individuals, however their biological and physiological interpretation remains a challenge. Overcoming this challenge is critical when aiming to associate metabolic signatures with potential pathological outcomes. To support this task, we introduce FORUM: a Knowledge Graph (KG) providing a semantic representation of relations between chemicals and biomedical concepts, built from a federation of life science databases and scientific literature repositories. An important number of scientific articles discuss relations between chemical compounds and biomedical concepts in various contexts, from biomarkers to therapeutic uses. The extraction of these statements and their interconnection in a graph structure can thus allow us to identify and explore relations strongly supported in the scientific literature.The use of a Semantic Web framework on biological data allows us to apply ontological based reasoning to infer new relations between entities. We show that these new relations provide different levels of abstraction and could open the path to new hypotheses. We estimate the statistical relevance of each extracted relation, explicit or inferred, using an enrichment analysis, and instantiate them as new knowledge in the KG to support results interpretation/further inquiries. Beyond this result, FORUM can also provide insights into complex biological questions and the extracted information could then be used for further developments.Containing more than 8 billion triples and providing more than 8 million relations, FORUM leverages the increasing availability of linked datasets in life science and is built in agreement with FAIR principles. A web interface to browse and download the extracted relations, as well as a SPARQL endpoint to directly probe the whole FORUM knowledge graph, are available at https://forum-webapp.semantic-metabolomics.fr. The code needed to reproduce the triplestore is available at https://github.com/eMetaboHUB/Forum-DiseasesChem.

Download Full-text

COMPUTATIONAL APPROACHES FOR DRUG REPOSITIONING AND COMBINATION THERAPY DESIGN

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720010004732 ◽

2010 ◽

Vol 08 (03) ◽

pp. 593-606 ◽

Cited By ~ 48

Author(s):

EKATERINA KOTELNIKOVA ◽

ANTON YURYEV ◽

ILYA MAZO ◽

NIKOLAI DARASELIA

Keyword(s):

Scientific Literature ◽

Drug Repositioning ◽

Enrichment Analysis ◽

Biological Data ◽

Cancer Drug ◽

Related Protein ◽

Microarray Experiments ◽

Data Points ◽

Computational Workflow ◽

Therapy Design

Heterogeneous high-throughput biological data become readily available for various diseases. The amount of data points generated by such experiments does not allow manual integration of the information to design the most optimal therapy for a disease. We describe a novel computational workflow for designing therapy using Ariadne Genomics Pathway Studio software. We use publically available microarray experiments for glioblastoma and automatically constructed ResNet and ChemEffect databases to exemplify how to find potentially effective chemicals for glioblastoma — the disease yet without effective treatment. Our first approach involved construction of signaling pathway affected in glioblastoma using scientific literature and data available in ResNet database. Compounds known to affect multiple proteins in this pathway were found in ChemEffect database. Another approach involved analysis of differential expression in glioblastoma patients using Sub-Network Enrichment Analysis (SNEA). SNEA identified angiogenesis-related protein Cyr61 as the major positive regulator upstream of genes differentially expressed in glioblastoma. Using our findings, we then identified breast cancer drug Fulvestrant as a major inhibitor of glioblastoma pathway as well as Cyr61. This suggested Fulvestrant as a potential treatment against glioblastoma. We further show how to increase efficacy of glioblastoma treatment by finding optimal combinations of Fulvestrant with other drugs.

Download Full-text

g:Profiler: a web server for functional enrichment analysis and conversions of gene lists (2019 update)

Nucleic Acids Research ◽

10.1093/nar/gkz369 ◽

2019 ◽

Vol 47 (W1) ◽

pp. W191-W198 ◽

Cited By ~ 434

Author(s):

Uku Raudvere ◽

Liis Kolberg ◽

Ivan Kuzmin ◽

Tambet Arak ◽

Priit Adler ◽

...

Keyword(s):

Web Design ◽

Enrichment Analysis ◽

Functional Enrichment Analysis ◽

Biological Data ◽

Functional Enrichment ◽

Quality Data ◽

Primary Data ◽

Web Interface ◽

Biological Data Analysis ◽

Data Source

Abstract Biological data analysis often deals with lists of genes arising from various studies. The g:Profiler toolset is widely used for finding biological categories enriched in gene lists, conversions between gene identifiers and mappings to their orthologs. The mission of g:Profiler is to provide a reliable service based on up-to-date high quality data in a convenient manner across many evidence types, identifier spaces and organisms. g:Profiler relies on Ensembl as a primary data source and follows their quarterly release cycle while updating the other data sources simultaneously. The current update provides a better user experience due to a modern responsive web interface, standardised API and libraries. The results are delivered through an interactive and configurable web design. Results can be downloaded as publication ready visualisations or delimited text files. In the current update we have extended the support to 467 species and strains, including vertebrates, plants, fungi, insects and parasites. By supporting user uploaded custom GMT files, g:Profiler is now capable of analysing data from any organism. All past releases are maintained for reproducibility and transparency. The 2019 update introduces an extensive technical rewrite making the services faster and more flexible. g:Profiler is freely available at https://biit.cs.ut.ee/gprofiler.

Download Full-text

COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology

Bioinformatics ◽

10.1093/bioinformatics/btaa834 ◽

2020 ◽

Cited By ~ 5

Author(s):

Daniel Domingo-Fernández ◽

Shounak Baksi ◽

Bruce Schultz ◽

Yojana Gadiya ◽

Reagon Karki ◽

...

Keyword(s):

Web Application ◽

Scientific Community ◽

Scientific Literature ◽

Research Community ◽

Supplementary Information ◽

Knowledge Graph ◽

Research Groups ◽

Knowledge Model ◽

Global Response ◽

Cause And Effect

Abstract Summary The COVID-19 crisis has elicited a global response by the scientific community that has led to a burst of publications on the pathophysiology of the virus. However, without coordinated efforts to organize this knowledge, it can remain hidden away from individual research groups. By extracting and formalizing this knowledge in a structured and computable form, as in the form of a knowledge graph, researchers can readily reason and analyze this information on a much larger scale. Here, we present the COVID-19 Knowledge Graph, an expansive cause-and-effect network constructed from scientific literature on the new coronavirus that aims to provide a comprehensive view of its pathophysiology. To make this resource available to the research community and facilitate its exploration and analysis, we also implemented a web application and released the KG in multiple standard formats. Availability and implementation The COVID-19 Knowledge Graph is publicly available under CC-0 license at https://github.com/covid19kg and https://bikmi.covid19-knowledgespace.de. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SynLethDB 2.0: A web-based knowledge graph database on synthetic lethality for novel anticancer drug discovery

10.1101/2021.12.28.474346 ◽

2021 ◽

Author(s):

Jie Wang ◽

Min Wu ◽

Xuhui Huang ◽

Li Wang ◽

Sophia Zhang ◽

...

Keyword(s):

Synthetic Lethality ◽

Enrichment Analysis ◽

Gene Set Enrichment Analysis ◽

Knowledge Graph ◽

Lethal Gene ◽

Web Interface ◽

Dynamic Graph ◽

Web Based ◽

Synthetic Lethal ◽

Graph Viewer

Two genes are synthetic lethal if mutations in both genes result in impaired cell viability, while mutation of either gene does not affect the cell survival. The potential usage of synthetic lethality (SL) in anticancer therapeutics has attracted many researchers to identify synthetic lethal gene pairs. To include newly identified SLs and more related knowledge, we present a new version of the SynLethDB database to facilitate the discovery of clinically relevant SLs. We extended the first version of SynLethDB database significantly by including new SLs identified through CRISPR screening, a knowledge graph about human SLs, and new web interface, etc. Over 16,000 new SLs and 26 types of other relationships have been added, encompassing relationships among 14,100 genes, 53 cancers, and 1,898 drugs, etc. Moreover, a brand-new web interface has been developed to include modules such as SL query by disease or compound, SL partner gene set enrichment analysis and knowledge graph browsing through a dynamic graph viewer. The data can be downloaded directly from the website or through the RESTful APIs. The database is accessible online at http://synlethdb.sist.shanghaitech.edu.cn/v2.

Download Full-text

nanoTRON: a Picasso module for MLP-based classification of super-resolution data

Bioinformatics ◽

10.1093/bioinformatics/btaa154 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3620-3622

Author(s):

Alexander Auer ◽

Maximilian T Strauss ◽

Sebastian Strauss ◽

Ralf Jungmann

Keyword(s):

Super Resolution ◽

Biological Data ◽

Supplementary Information ◽

Graphic User Interface ◽

Classification Of Images ◽

Super Resolution Microscopy ◽

Level Analysis ◽

Initial Classification ◽

Resolution Data

Abstract Motivation Classification of images is an essential task in higher-level analysis of biological data. By bypassing the diffraction limit of light, super-resolution microscopy opened up a new way to look at molecular details using light microscopy, producing large amounts of data with exquisite spatial detail. Statistical exploration of data usually needs initial classification, which is up to now often performed manually. Results We introduce nanoTRON, an interactive open-source tool, which allows super-resolution data classification based on image recognition. It extends the software package Picasso with the first deep learning tool with a graphic user interface. Availability and implementation nanoTRON is written in Python and freely available under the MIT license as a part of the software collection Picasso on GitHub (http://www.github.com/jungmannlab/picasso). All raw data can be obtained from the authors upon reasonable request. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

To the Question of the Classification Patterns of Social Stratification in the Countries of the World (Brief Overview of Some Concepts and Author’s Comment)

Sociology and Law ◽

10.35854/2219-6242-2020-3-65-75 ◽

2020 ◽

pp. 65-75

Author(s):

S. N. Smirnov

Keyword(s):

Social Sciences ◽

Social Stratification ◽

Legal Status ◽

Scientific Literature ◽

Point Of View ◽

The World

The author considers the problems of typification of society. Some concepts of typification of social stratification models in different countries formulated and justified in historical and legal, historical, sociological, and economic scientific literature are reviewed. The circumstances that make it difficult to formulate universal concepts designed for application in the complex of social Sciences are identified. These circumstances include insufficient consideration of legal factors, including the position of the legislator, the specifics of the corporate legal status, and the characteristics of the mechanism for changing individual legal status. The author offers a variant of classification of society types from the point of view of legal registration of their structure. The possibility of distinguishing types such as consolidated companies and segmented companies is justified.

Download Full-text

UniBioDicts: Unified access to Biological Dictionaries

Bioinformatics ◽

10.1093/bioinformatics/btaa1065 ◽

2020 ◽

Author(s):

John Zobolas ◽

Vasundra Touré ◽

Martin Kuiper ◽

Steven Vercruysse

Keyword(s):

User Interface ◽

Life Science ◽

Biological Data ◽

Supplementary Information ◽

Supplementary Data ◽

Query Interface ◽

Controlled Vocabularies ◽

Search String ◽

Software Packages ◽

The Right

Abstract Summary We present a set of software packages that provide uniform access to diverse biological vocabulary resources that are instrumental for current biocuration efforts and tools. The Unified Biological Dictionaries (UniBioDicts or UBDs) provide a single query-interface for accessing the online API services of leading biological data providers. Given a search string, UBDs return a list of matching term, identifier and metadata units from databases (e.g. UniProt), controlled vocabularies (e.g. PSI-MI) and ontologies (e.g. GO, via BioPortal). This functionality can be connected to input fields (user-interface components) that offer autocomplete lookup for these dictionaries. UBDs create a unified gateway for accessing life science concepts, helping curators find annotation terms across resources (based on descriptive metadata and unambiguous identifiers), and helping data users search and retrieve the right query terms. Availability and implementation The UBDs are available through npm and the code is available in the GitHub organisation UniBioDicts (https://github.com/UniBioDicts) under the Affero GPL license. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Establishing a consensus for the hallmarks of cancer based on gene ontology and pathway annotations

BMC Bioinformatics ◽

10.1186/s12859-021-04105-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yi Chen ◽

Fons. J. Verbeek ◽

Katherine Wolstencroft

Keyword(s):

Gene Ontology ◽

Enrichment Analysis ◽

Biological Data ◽

Hallmarks Of Cancer ◽

High Throughput Analysis ◽

Knowledge Resources ◽

Gene Set ◽

Cancer Hallmarks ◽

Starting Point ◽

High Level

Abstract Background The hallmarks of cancer provide a highly cited and well-used conceptual framework for describing the processes involved in cancer cell development and tumourigenesis. However, methods for translating these high-level concepts into data-level associations between hallmarks and genes (for high throughput analysis), vary widely between studies. The examination of different strategies to associate and map cancer hallmarks reveals significant differences, but also consensus. Results Here we present the results of a comparative analysis of cancer hallmark mapping strategies, based on Gene Ontology and biological pathway annotation, from different studies. By analysing the semantic similarity between annotations, and the resulting gene set overlap, we identify emerging consensus knowledge. In addition, we analyse the differences between hallmark and gene set associations using Weighted Gene Co-expression Network Analysis and enrichment analysis. Conclusions Reaching a community-wide consensus on how to identify cancer hallmark activity from research data would enable more systematic data integration and comparison between studies. These results highlight the current state of the consensus and offer a starting point for further convergence. In addition, we show how a lack of consensus can lead to large differences in the biological interpretation of downstream analyses and discuss the challenges of annotating changing and accumulating biological data, using intermediate knowledge resources that are also changing over time.

Download Full-text

MetaADEDB 2.0: a comprehensive database on adverse drug events

Bioinformatics ◽

10.1093/bioinformatics/btaa973 ◽

2020 ◽

Author(s):

Zhuohang Yu ◽

Zengrui Wu ◽

Weihua Li ◽

Guixia Liu ◽

Yun Tang

Keyword(s):

Safety Assessment ◽

Adverse Drug Events ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Supplementary Information ◽

Online Database ◽

Web Interface ◽

Drug Discovery And Development ◽

Comprehensive Information ◽

User Friendly

Abstract Summary MetaADEDB is an online database we developed to integrate comprehensive information on adverse drug events (ADEs). The first version of MetaADEDB was released in 2013 and has been widely used by researchers. However, it has not been updated for more than seven years. Here, we reported its second version by collecting more and newer data from the U.S. FDA Adverse Event Reporting System (FAERS) and Canada Vigilance Adverse Reaction Online Database, in addition to the original three sources. The new version consists of 744 709 drug–ADE associations between 8498 drugs and 13 193 ADEs, which has an over 40% increase in drug–ADE associations compared to the previous version. Meanwhile, we developed a new and user-friendly web interface for data search and analysis. We hope that MetaADEDB 2.0 could provide a useful tool for drug safety assessment and related studies in drug discovery and development. Availability and implementation The database is freely available at: http://lmmd.ecust.edu.cn/metaadedb/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text