PharmKG: a dedicated knowledge graph benchmark for bomedical data mining

Author(s):  
Shuangjia Zheng ◽  
Jiahua Rao ◽  
Ying Song ◽  
Jixian Zhang ◽  
Xianglu Xiao ◽  
...  

Abstract Biomedical knowledge graphs (KGs), which can help with the understanding of complex biological systems and pathologies, have begun to play a critical role in medical practice and research. However, challenges remain in their embedding and use due to their complex nature and the specific demands of their construction. Existing studies often suffer from problems such as sparse and noisy datasets, insufficient modeling methods and non-uniform evaluation metrics. In this work, we established a comprehensive KG system for the biomedical field in an attempt to bridge the gap. Here, we introduced PharmKG, a multi-relational, attributed biomedical KG, composed of more than 500 000 individual interconnections between genes, drugs and diseases, with 29 relation types over a vocabulary of ~8000 disambiguated entities. Each entity in PharmKG is attached with heterogeneous, domain-specific information obtained from multi-omics data, i.e. gene expression, chemical structure and disease word embedding, while preserving the semantic and biomedical features. For baselines, we offered nine state-of-the-art KG embedding (KGE) approaches and a new biological, intuitive, graph neural network-based KGE method that uses a combination of both global network structure and heterogeneous domain features. Based on the proposed benchmark, we conducted extensive experiments to assess these KGE models using multiple evaluation metrics. Finally, we discussed our observations across various downstream biological tasks and provide insights and guidelines for how to use a KG in biomedicine. We hope that the unprecedented quality and diversity of PharmKG will lead to advances in biomedical KG construction, embedding and application.

2021 ◽  
Vol 18 (4) ◽  
pp. 1-23
Author(s):  
Tobias Gysi ◽  
Christoph Müller ◽  
Oleksandr Zinenko ◽  
Stephan Herhut ◽  
Eddie Davis ◽  
...  

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.


2020 ◽  
Vol 153 (20) ◽  
pp. 201103
Author(s):  
Yoshifumi Noguchi ◽  
Miyabi Hiyama ◽  
Motoyuki Shiga ◽  
Hidefumi Akiyama ◽  
Osamu Sugino

Energies ◽  
2021 ◽  
Vol 14 (13) ◽  
pp. 3800
Author(s):  
Sebastian Krapf ◽  
Nils Kemmerzell ◽  
Syed Khawaja Haseeb Khawaja Haseeb Uddin ◽  
Manuel Hack Hack Vázquez ◽  
Fabian Netzler ◽  
...  

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.


Author(s):  
Yufei Li ◽  
Xiaoyong Ma ◽  
Xiangyu Zhou ◽  
Pengzhen Cheng ◽  
Kai He ◽  
...  

Abstract Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.


2004 ◽  
Vol 02 (01) ◽  
pp. 215-239 ◽  
Author(s):  
TOLGA CAN ◽  
YUAN-FANG WANG

We present a new method for conducting protein structure similarity searches, which improves on the efficiency of some existing techniques. Our method is grounded in the theory of differential geometry on 3D space curve matching. We generate shape signatures for proteins that are invariant, localized, robust, compact, and biologically meaningful. The invariancy of the shape signatures allows us to improve similarity searching efficiency by adopting a hierarchical coarse-to-fine strategy. We index the shape signatures using an efficient hashing-based technique. With the help of this technique we screen out unlikely candidates and perform detailed pairwise alignments only for a small number of candidates that survive the screening process. Contrary to other hashing based techniques, our technique employs domain specific information (not just geometric information) in constructing the hash key, and hence, is more tuned to the domain of biology. Furthermore, the invariancy, localization, and compactness of the shape signatures allow us to utilize a well-known local sequence alignment algorithm for aligning two protein structures. One measure of the efficacy of the proposed technique is that we were able to perform structure alignment queries 36 times faster (on the average) than a well-known method while keeping the quality of the query results at an approximately similar level.


2021 ◽  
Vol 10 (2) ◽  
pp. 42-60
Author(s):  
Khadidja Chettah ◽  
Amer Draa

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Wei Zhou ◽  
Chengdong Wu ◽  
Dali Chen ◽  
Zhenzhu Wang ◽  
Yugen Yi ◽  
...  

Recently, microaneurysm (MA) detection has attracted a lot of attention in the medical image processing community. Since MAs can be seen as the earliest lesions in diabetic retinopathy, their detection plays a critical role in diabetic retinopathy diagnosis. In this paper, we propose a novel MA detection approach named multifeature fusion dictionary learning (MFFDL). The proposed method consists of four steps: preprocessing, candidate extraction, multifeature dictionary learning, and classification. The novelty of our proposed approach lies in incorporating the semantic relationships among multifeatures and dictionary learning into a unified framework for automatic detection of MAs. We evaluate the proposed algorithm by comparing it with the state-of-the-art approaches and the experimental results validate the effectiveness of our algorithm.


Sensors ◽  
2018 ◽  
Vol 18 (12) ◽  
pp. 4155 ◽  
Author(s):  
Pedro Cumino ◽  
Wellington Lobato Junior ◽  
Thais Tavares ◽  
Hugo Santos ◽  
Denis Rosário ◽  
...  

Collaboration between multiple Unmanned Aerial Vehicles (UAVs) to set up a Flying Ad Hoc Network (FANET) is a growing trend since future applications claim for more autonomous and rapid deployable systems. The user experience on watching videos transmitted over FANETs should always be satisfactory even under influence of topology changes caused by the energy consumption of UAVs. In addition, the FANET must keep the UAVs cooperating as much as possible during a mission. However, one of the main challenges in FANET is how to mitigate the impact of limited energy resources of UAVs on the FANET operation in order to monitor the environment for a long period of time. In this sense, UAV replacement is required in order to avoid the premature death of nodes, network disconnections, route failures, void areas, and low-quality video transmissions. In addition, decision-making must take into account energy consumption associated with UAV movements, since they are generally quite energy-intensive. This article proposes a cooperative UAV scheme for enhancing video transmission and global energy efficiency called VOEI. The main goal of VOEI is to maintain the video with QoE support while supporting the nodes with a good connectivity quality level and flying for a long period of time. Based on an Software Defined Network (SDN) paradigm, the VOEI assumes the existence of a centrailized controller node to compute reliable and energy-efficiency routes, as well as detects the appropriate moment for UAV replacement by considering global FANET context information to provide energy-efficiency operations. Based on simulation results, we conclude that VOEI can effectively mitigate the energy challenges of FANET, since it provides energy-efficiency operations, avoiding network death, route failure, and void area, as well as network partitioning compared to state-of-the-art algorithm. In addition, VOEI delivers videos with suitable Quality of Experience (QoE) to end-users at any time, which is not achieved by the state-of-the-art algorithm.


2006 ◽  
Vol 38 (1) ◽  
pp. 45-59
Author(s):  
Zora Krnjaic

The paper starts from the assumption that expert thinking is a complex manner of thinking of higher order, comprising higher mental functions and complex capabilities based on deep structures and knowledge patterns. It is a domain-determined and specialized thinking developed through systematic education. Particular aspects of ability, selected for this study, primarily concern the relation between abilities and knowledge and the relation between general and specific abilities. Particular emphasis was laid on the key concepts of the theories presented, relevant for the study of the complex nature of expert thinking. Special attention was paid to mediated intelligence and the process of systemogenesis of knowledge, Katel?s definition of crystallized intelligence, Gardener?s work on multiple intelligences in the context of knowledge and experience as well as Sternberg?s two-facet subtheory. The capability for abstract thought and the ability to select what is important as well as the domain of relevant specific capability are assumed to be of special relevance for understanding expert thinking and, as such, they were articulated and examined. Expert thinking-abstract, specialized and domain-specific, seems to be based on general and specific capabilities and their interaction.


Sign in / Sign up

Export Citation Format

Share Document