scholarly journals Drug Disease Relation Extraction from Biomedical Literature Using NLP and Machine Learning

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Wahiba Ben Abdessalem Karaa ◽  
Eman H. Alkhammash ◽  
Aida Bchir

Extracting the relations between medical concepts is very valuable in the medical domain. Scientists need to extract relevant information and semantic relations between medical concepts, including protein and protein, gene and protein, drug and drug, and drug and disease. These relations can be extracted from biomedical literature available on various databases. This study examines the extraction of semantic relations that can occur between diseases and drugs. Findings will help specialists make good decisions when administering a medication to a patient and will allow them to continuously be up to date in their field. The objective of this work is to identify different features related to drugs and diseases from medical texts by applying Natural Language Processing (NLP) techniques and UMLS ontology. The Support Vector Machine classifier uses these features to extract valuable semantic relationships among text entities. The contributing factor of this research is the combination of the strength of a suggested NLP technique, which takes advantage of UMLS ontology and enables the extraction of correct and adequate features (frequency features, lexical features, morphological features, syntactic features, and semantic features), and Support Vector Machines with polynomial kernel function. These features are manipulated to pinpoint the relations between drug and disease. The proposed approach was evaluated using a standard corpus extracted from MEDLINE. The finding considerably improves the performance and outperforms similar works, especially the f-score for the most important relation “cure,” which is equal to 98.19%. The accuracy percentage is better than those in all the existing works for all the relations.

2018 ◽  
Vol 25 (6) ◽  
pp. 726-733
Author(s):  
Maria S. Karyaeva ◽  
Pavel I. Braslavski ◽  
Valery A. Sokolov

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.


Author(s):  
Kaan Ant ◽  
Ugur Sogukpinar ◽  
Mehmet Fatif Amasyali

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.


2017 ◽  
Author(s):  
A. S. M. Ashique Mahmood ◽  
Shruti Rao ◽  
Peter McGarvey ◽  
Cathy Wu ◽  
Subha Madhavan ◽  
...  

AbstractTumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials.


2019 ◽  
Vol 5 (5) ◽  
pp. 212-215
Author(s):  
Abeer AlArfaj

Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. The Arabic language has complex morphological, grammatical, and semantic aspects since it is a highly inflectional and derivational language, which makes task even more challenging. In this paper, we present a review of the state of the art for relation extraction from texts, addressing the progress and difficulties in this field. We discuss several aspects related to this task, considering the taxonomic and non-taxonomic relation extraction methods. Majority of relation extraction approaches implement a combination of statistical and linguistic techniques to extract semantic relations from text. We also give special attention to the state of the work on relation extraction from Arabic texts, which need further progress.


2019 ◽  
Author(s):  
Peng Su ◽  
Gang Li ◽  
Cathy Wu ◽  
K. Vijay-Shanker

AbstractSignificant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.


2021 ◽  
Author(s):  
Ziheng Zhang ◽  
Feng Han ◽  
Hongjian Zhang ◽  
Tomohiro Aoki ◽  
Katsuhiko Ogasawara

BACKGROUND Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. OBJECTIVE The objective of this study is to examine how changes in the ratio of biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. METHODS We downloaded abstracts of 214892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. RESULTS The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased, when the ratio of biomedical domain to general domain data was 3: 7 to 5: 5. CONCLUSIONS This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction.


Author(s):  
Olga Acosta ◽  
César Aguilar

This article sketches the development of a method for mining concepts applied on medical corpora in Spanish. Such method is based in the approach formulated by Ananiadou and McNaught, who give a special relevance to the need to create and use natural language processing (NLP) tools, in order to extract information from large collections of documents, such as PubMed (www.ncbi.nlm.nih.gov/pubmed/). Thanks to this repository, projects such as the Corpus Genia (www.geniaproject.org), the MEDIE search engine (www.nactem.ac.uk/medie/), which considers syntactic criteria and semantics to extract medical concepts, or the Open Biological and Biomedical Ontology Project (http://obofoundry.org/), which focuses on the development of ontologies that provide an organized knowledge system in biomedicine. Particularly, this proposal focused in two objectives: (1) the extraction of specialized terms and (2) the identification of lexical-semantic relationships, in concrete hyponymy/hypernymy and meronymy.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Shengyu Liu ◽  
Buzhou Tang ◽  
Qingcai Chen ◽  
Xiaolong Wang

Drug-drug interaction (DDI) extraction as a typical relation extraction task in natural language processing (NLP) has always attracted great attention. Most state-of-the-art DDI extraction systems are based on support vector machines (SVM) with a large number of manually defined features. Recently, convolutional neural networks (CNN), a robust machine learning method which almost does not need manually defined features, has exhibited great potential for many NLP tasks. It is worth employing CNN for DDI extraction, which has never been investigated. We proposed a CNN-based method for DDI extraction. Experiments conducted on the 2013 DDIExtraction challenge corpus demonstrate that CNN is a good choice for DDI extraction. The CNN-based DDI extraction method achieves anF-score of 69.75%, which outperforms the existing best performing method by 2.75%.


2021 ◽  
Author(s):  
Peng Su ◽  
K. Vijay-Shanker

Abstract Background: Recently, automatically extracting biomedical relations has been a significant subject in biomedical research due to the rapid growth of biomedical literature. Since the adaptation to the biomedical domain, the transformer-based BERT models have produced leading results on many biomedical natural language processing tasks. In this work, we will explore the approaches to improve the BERT model for relation extraction tasks in both the pre-training and fine-tuning stages of its applications. In the pre-training stage, we add another level of BERT adaptation on sub-domain data to bridge the gap between domain knowledge and task-specific knowledge. Also, we propose methods to incorporate the ignored knowledge in the last layer of BERT to improve its fine-tuning. Results: The experiment results demonstrate that our approaches for pre-training and fine-tuning can improve the BERT model performance. After combining the two proposed techniques, our approach outperforms the original BERT models with averaged F1 score improvement of 2.1% on relation extraction tasks. Moreover, our approach achieves state-of-the-art performance on three relation extraction benchmark datasets. Conclusions: The extra pre-training step on sub-domain data can help the BERT model generalization on specific tasks, and our proposed fine-tuning mechanism could utilize the knowledge in the last layer of BERT to boost the model performance. Furthermore, the combination of these two approaches further improves the performance of BERT model on the relation extraction tasks.


TecnoLógicas ◽  
2019 ◽  
Vol 22 ◽  
pp. 49-62 ◽  
Author(s):  
Jefferson A. Peña-Torres ◽  
Raúl E. Gutiérrez ◽  
Víctor A. Bucheli ◽  
Fabio A. González

In this article, we study the relation extraction problem from Natural Language Processing (NLP) implementing a domain adaptation setting without external resources. We trained a Deep Learning (DL) model for Relation Extraction (RE), which extracts semantic relations in the biomedical domain. However, can the model be applied to different domains? The model should be adaptable to automatically extract relationships across different domains using the DL network. Completely training DL models in a short time is impractical because the models should quickly adapt to different datasets in several domains without delay. Therefore, adaptation is crucial for intelligent systems, where changing factors and unanticipated perturbations are common. In this study, we present a detailed analysis of the problem, as well as preliminary experimentation, results, and their evaluation.


Sign in / Sign up

Export Citation Format

Share Document