Drug Disease Relation Extraction from Biomedical Literature Using NLP and Machine Learning

Mobile Information Systems ◽

10.1155/2021/9958410 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Wahiba Ben Abdessalem Karaa ◽

Eman H. Alkhammash ◽

Aida Bchir

Keyword(s):

Language Processing ◽

Relation Extraction ◽

Relevant Information ◽

Biomedical Literature ◽

Semantic Relations ◽

Polynomial Kernel ◽

Support Vector ◽

Semantic Features ◽

Semantic Relationships ◽

Medical Concepts

Extracting the relations between medical concepts is very valuable in the medical domain. Scientists need to extract relevant information and semantic relations between medical concepts, including protein and protein, gene and protein, drug and drug, and drug and disease. These relations can be extracted from biomedical literature available on various databases. This study examines the extraction of semantic relations that can occur between diseases and drugs. Findings will help specialists make good decisions when administering a medication to a patient and will allow them to continuously be up to date in their field. The objective of this work is to identify different features related to drugs and diseases from medical texts by applying Natural Language Processing (NLP) techniques and UMLS ontology. The Support Vector Machine classifier uses these features to extract valuable semantic relationships among text entities. The contributing factor of this research is the combination of the strength of a suggested NLP technique, which takes advantage of UMLS ontology and enables the extraction of correct and adequate features (frequency features, lexical features, morphological features, syntactic features, and semantic features), and Support Vector Machines with polynomial kernel function. These features are manipulated to pinpoint the relations between drug and disease. The proposed approach was evaluated using a standard corpus extracted from MEDLINE. The finding considerably improves the performance and outperforms similar works, especially the f-score for the most important relation “cure,” which is equal to 98.19%. The accuracy percentage is better than those in all the existing works for all the relations.

Download Full-text

Word Embedding for Semantically Relative Words: an Experimental Study

Modeling and Analysis of Information Systems ◽

10.18255/1818-1015-2018-6-726-733 ◽

2018 ◽

Vol 25 (6) ◽

pp. 726-733

Author(s):

Maria S. Karyaeva ◽

Pavel I. Braslavski ◽

Valery A. Sokolov

Keyword(s):

Experimental Study ◽

Natural Language Processing ◽

Language Processing ◽

Intelligent Systems ◽

Russian Language ◽

Word Embedding ◽

Semantic Relations ◽

Automatic Extraction ◽

Semantic Relationships ◽

The Russian Language

The ability to identify semantic relations between words has made a word2vec model widely used in NLP tasks. The idea of word2vec is based on a simple rule that a higher similarity can be reached if two words have a similar context. Each word can be represented as a vector, so the closest coordinates of vectors can be interpreted as similar words. It allows to establish semantic relations (synonymy, relations of hypernymy and hyponymy and other semantic relations) by applying an automatic extraction. The extraction of semantic relations by hand is considered as a time-consuming and biased task, requiring a large amount of time and some help of experts. Unfortunately, the word2vec model provides an associative list of words which does not consist of relative words only. In this paper, we show some additional criteria that may be applicable to solve this problem. Observations and experiments with well-known characteristics, such as word frequency, a position in an associative list, might be useful for improving results for the task of extraction of semantic relations for the Russian language by using word embedding. In the experiments, the word2vec model trained on the Flibusta and pairs from Wiktionary are used as examples with semantic relationships. Semantically related words are applicable to thesauri, ontologies and intelligent systems for natural language processing.

Download Full-text

Comparison of Templates with Word2vec in Finding Semantic Relations Between Words

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201805007 ◽

2018 ◽

pp. 13-17

Author(s):

Kaan Ant ◽

Ugur Sogukpinar ◽

Mehmet Fatif Amasyali

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Large Scale ◽

Semantic Relations ◽

Template Method ◽

Semantic Relationships ◽

Semantic Spaces

The use of databases those containing semantic relationships between words is becoming increasingly widespread in order to make natural language processing work more effective. Instead of the word-bag approach, the suggested semantic spaces give the distances between words, but they do not express the relation types. In this study, it is shown how semantic spaces can be used to find the type of relationship and it is compared with the template method. According to the results obtained on a very large scale, while is_a and opposite are more successful for semantic spaces for relations, the approach of templates is more successful in the relation types at_location, made_of and non relational.

Download Full-text

eGARD: Extracting associations between genomic anomalies and drug responses from text

10.1101/148833 ◽

2017 ◽

Author(s):

A. S. M. Ashique Mahmood ◽

Shruti Rao ◽

Peter McGarvey ◽

Cathy Wu ◽

Subha Madhavan ◽

...

Keyword(s):

Patient Outcomes ◽

Language Processing ◽

Relevant Information ◽

Biomedical Literature ◽

Cancer Treatments ◽

Therapeutic Implications ◽

Integral Role ◽

Speed Up ◽

Best Fit ◽

Genomic Anomalies

AbstractTumor molecular profiling plays an integral role in identifying genomic anomalies which may help in personalizing cancer treatments, improving patient outcomes and minimizing risks associated with different therapies. However, critical information regarding the evidence of clinical utility of such anomalies is largely buried in biomedical literature. It is becoming prohibitive for biocurators, clinical researchers and oncologists to keep up with the rapidly growing volume and breadth of information, especially those that describe therapeutic implications of biomarkers and therefore relevant for treatment selection. In an effort to improve and speed up the process of manually reviewing and extracting relevant information from literature, we have developed a natural language processing (NLP)-based text mining (TM) system called eGARD (extracting Genomic Anomalies association with Response to Drugs). This system relies on the syntactic nature of sentences coupled with various textual features to extract relations between genomic anomalies and drug response from MEDLINE abstracts. Our system achieved high precision, recall and F-measure of up to 0.95, 0.86 and 0.90, respectively, on annotated evaluation datasets created in-house and obtained externally from PharmGKB. Additionally, the system extracted information that helps determine the confidence level of extraction to support prioritization of curation. Such a system will enable clinical researchers to explore the use of published markers to stratify patients upfront for ‘best-fit’ therapies and readily generate hypotheses for new clinical trials.

Download Full-text

Towards relation extraction from Arabic text: a review

International Robotics & Automation Journal ◽

10.15406/iratj.2019.05.00195 ◽

2019 ◽

Vol 5 (5) ◽

pp. 212-215

Author(s):

Abeer AlArfaj

Keyword(s):

Language Processing ◽

Question Answering ◽

State Of The Art ◽

Relation Extraction ◽

Arabic Language ◽

Extraction Methods ◽

The State ◽

Semantic Relations ◽

Arabic Text ◽

Taxonomic Relation

Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. The Arabic language has complex morphological, grammatical, and semantic aspects since it is a highly inflectional and derivational language, which makes task even more challenging. In this paper, we present a review of the state of the art for relation extraction from texts, addressing the progress and difficulties in this field. We discuss several aspects related to this task, considering the taxonomic and non-taxonomic relation extraction methods. Majority of relation extraction approaches implement a combination of statistical and linguistic techniques to extract semantic relations from text. We also give special attention to the state of the work on relation extraction from Arabic texts, which need further progress.

Download Full-text

Using distant supervision to augment manually annotated data for relation extraction

10.1101/626226 ◽

2019 ◽

Author(s):

Peng Su ◽

Gang Li ◽

Cathy Wu ◽

K. Vijay-Shanker

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Relation Extraction ◽

Biomedical Literature ◽

Training Data ◽

Distant Supervision ◽

Large Size ◽

Domain Expertise

AbstractSignificant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.

Download Full-text

Extraction of similar biomedical terms in biomedical literature mining: Examining the effect of the ratio of biomedical domain to general domain data (Preprint)

10.2196/preprints.30300 ◽

2021 ◽

Author(s):

Ziheng Zhang ◽

Feng Han ◽

Hongjian Zhang ◽

Tomohiro Aoki ◽

Katsuhiko Ogasawara

Keyword(s):

Language Processing ◽

Relation Extraction ◽

Medical Data ◽

Biomedical Literature ◽

Literature Mining ◽

Biomedical Domain ◽

Pubmed Central ◽

General Domain ◽

Biomedical Information Retrieval ◽

Science Community

BACKGROUND Biomedical terms extracted using Word2vec, the most popular word embedding model in recent years, serve as the foundation for various natural language processing (NLP) applications, such as biomedical information retrieval, relation extraction, and recommendation systems. OBJECTIVE The objective of this study is to examine how changes in the ratio of biomedical domain to general domain data in the corpus affect the extraction of similar biomedical terms using Word2vec. METHODS We downloaded abstracts of 214892 articles from PubMed Central (PMC) and the 3.9 GB Billion Word (BW) benchmark corpus from the computer science community. The datasets were preprocessed and grouped into 11 corpora based on the ratio of BW to PMC, ranging from 0:10 to 10:0, and then Word2vec models were trained on these corpora. The cosine similarities between the biomedical terms obtained from the Word2vec models were then compared in each model. RESULTS The results indicated that the models trained with both BW and PMC data outperformed the model trained only with medical data. The similarity between the biomedical terms extracted by the Word2vec model increased, when the ratio of biomedical domain to general domain data was 3: 7 to 5: 5. CONCLUSIONS This study allows NLP researchers to apply Word2vec based on more information and increase the similarity of extracted biomedical terms to improve their effectiveness in NLP applications, such as biomedical information extraction.

Download Full-text

Designing a Concept-Mining Model for the Extraction of Medical Information in Spanish

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch059 ◽

2021 ◽

pp. 856-872

Author(s):

Olga Acosta ◽

César Aguilar

Keyword(s):

Natural Language Processing ◽

Search Engine ◽

Language Processing ◽

Medical Information ◽

Biomedical Ontology ◽

Semantic Relationships ◽

Concept Mining ◽

Mining Model ◽

Medical Concepts ◽

Special Relevance

This article sketches the development of a method for mining concepts applied on medical corpora in Spanish. Such method is based in the approach formulated by Ananiadou and McNaught, who give a special relevance to the need to create and use natural language processing (NLP) tools, in order to extract information from large collections of documents, such as PubMed (www.ncbi.nlm.nih.gov/pubmed/). Thanks to this repository, projects such as the Corpus Genia (www.geniaproject.org), the MEDIE search engine (www.nactem.ac.uk/medie/), which considers syntactic criteria and semantics to extract medical concepts, or the Open Biological and Biomedical Ontology Project (http://obofoundry.org/), which focuses on the development of ontologies that provide an organized knowledge system in biomedicine. Particularly, this proposal focused in two objectives: (1) the extraction of specialized terms and (2) the identification of lexical-semantic relationships, in concrete hyponymy/hypernymy and meronymy.

Download Full-text

Drug-Drug Interaction Extraction via Convolutional Neural Networks

Computational and Mathematical Methods in Medicine ◽

10.1155/2016/6918381 ◽

2016 ◽

Vol 2016 ◽

pp. 1-8 ◽

Cited By ~ 41

Author(s):

Shengyu Liu ◽

Buzhou Tang ◽

Qingcai Chen ◽

Xiaolong Wang

Keyword(s):

Neural Networks ◽

Drug Interaction ◽

Convolutional Neural Networks ◽

Language Processing ◽

Relation Extraction ◽

Good Choice ◽

Support Vector ◽

Vector Machines ◽

Interaction Extraction ◽

Drug Drug Interaction

Drug-drug interaction (DDI) extraction as a typical relation extraction task in natural language processing (NLP) has always attracted great attention. Most state-of-the-art DDI extraction systems are based on support vector machines (SVM) with a large number of manually defined features. Recently, convolutional neural networks (CNN), a robust machine learning method which almost does not need manually defined features, has exhibited great potential for many NLP tasks. It is worth employing CNN for DDI extraction, which has never been investigated. We proposed a CNN-based method for DDI extraction. Experiments conducted on the 2013 DDIExtraction challenge corpus demonstrate that CNN is a good choice for DDI extraction. The CNN-based DDI extraction method achieves anF-score of 69.75%, which outperforms the existing best performing method by 2.75%.

Download Full-text

Investigation of Improving The Pre-Training And Fine-Tuning of BERT Model For Biomedical Relation Extraction

10.21203/rs.3.rs-640112/v1 ◽

2021 ◽

Author(s):

Peng Su ◽

K. Vijay-Shanker

Keyword(s):

Language Processing ◽

Domain Knowledge ◽

Model Performance ◽

Relation Extraction ◽

Biomedical Literature ◽

Fine Tuning ◽

Score Improvement ◽

Model Generalization ◽

Training Stage ◽

Benchmark Datasets

Abstract Background: Recently, automatically extracting biomedical relations has been a significant subject in biomedical research due to the rapid growth of biomedical literature. Since the adaptation to the biomedical domain, the transformer-based BERT models have produced leading results on many biomedical natural language processing tasks. In this work, we will explore the approaches to improve the BERT model for relation extraction tasks in both the pre-training and fine-tuning stages of its applications. In the pre-training stage, we add another level of BERT adaptation on sub-domain data to bridge the gap between domain knowledge and task-specific knowledge. Also, we propose methods to incorporate the ignored knowledge in the last layer of BERT to improve its fine-tuning. Results: The experiment results demonstrate that our approaches for pre-training and fine-tuning can improve the BERT model performance. After combining the two proposed techniques, our approach outperforms the original BERT models with averaged F1 score improvement of 2.1% on relation extraction tasks. Moreover, our approach achieves state-of-the-art performance on three relation extraction benchmark datasets. Conclusions: The extra pre-training step on sub-domain data can help the BERT model generalization on specific tasks, and our proposed fine-tuning mechanism could utilize the knowledge in the last layer of BERT to boost the model performance. Furthermore, the combination of these two approaches further improves the performance of BERT model on the relation extraction tasks.

Download Full-text

How to Adapt Deep Learning Models to a New Domain: The Case of Biomedical Relation Extraction

TecnoLógicas ◽

10.22430/22565337.1483 ◽

2019 ◽

Vol 22 ◽

pp. 49-62 ◽

Cited By ~ 1

Author(s):

Jefferson A. Peña-Torres ◽

Raúl E. Gutiérrez ◽

Víctor A. Bucheli ◽

Fabio A. González

Keyword(s):

Deep Learning ◽

Language Processing ◽

Intelligent Systems ◽

Domain Adaptation ◽

Relation Extraction ◽

Semantic Relations ◽

Biomedical Domain ◽

External Resources ◽

Short Time ◽

Biomedical Relation Extraction

In this article, we study the relation extraction problem from Natural Language Processing (NLP) implementing a domain adaptation setting without external resources. We trained a Deep Learning (DL) model for Relation Extraction (RE), which extracts semantic relations in the biomedical domain. However, can the model be applied to different domains? The model should be adaptable to automatically extract relationships across different domains using the DL network. Completely training DL models in a short time is impractical because the models should quickly adapt to different datasets in several domains without delay. Therefore, adaptation is crucial for intelligent systems, where changing factors and unanticipated perturbations are common. In this study, we present a detailed analysis of the problem, as well as preliminary experimentation, results, and their evaluation.

Download Full-text