Using distant supervision to augment manually annotated data for relation extraction

Mapping Intimacies ◽

10.1101/626226 ◽

2019 ◽

Author(s):

Peng Su ◽

Gang Li ◽

Cathy Wu ◽

K. Vijay-Shanker

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Relation Extraction ◽

Biomedical Literature ◽

Training Data ◽

Distant Supervision ◽

Large Size ◽

Domain Expertise

AbstractSignificant progress has been made in applying deep learning on natural language processing tasks recently. However, deep learning models typically require a large amount of annotated training data while often only small labeled datasets are available for many natural language processing tasks in biomedical literature. Building large-size datasets for deep learning is expensive since it involves considerable human effort and usually requires domain expertise in specialized fields. In this work, we consider augmenting manually annotated data with large amounts of data using distant supervision. However, data obtained by distant supervision is often noisy, we first apply some heuristics to remove some of the incorrect annotations. Then using methods inspired from transfer learning, we show that the resulting models outperform models trained on the original manually annotated sets.

Download Full-text

Deep Learning-Based Natural Language Processing in Radiology: The Impact of Report Complexity, Disease Prevalence, Dataset Size, and Algorithm Type on Model Performance

Journal of Medical Systems ◽

10.1007/s10916-021-01761-4 ◽

2021 ◽

Vol 45 (10) ◽

Author(s):

A. W. Olthof ◽

P. M. A. van Ooijen ◽

L. J. Cornelissen

Keyword(s):

Neural Network ◽

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Predictive Value ◽

Training Data ◽

Radiology Reports ◽

Dataset Size ◽

The Impact

AbstractIn radiology, natural language processing (NLP) allows the extraction of valuable information from radiology reports. It can be used for various downstream tasks such as quality improvement, epidemiological research, and monitoring guideline adherence. Class imbalance, variation in dataset size, variation in report complexity, and algorithm type all influence NLP performance but have not yet been systematically and interrelatedly evaluated. In this study, we investigate these factors on the performance of four types [a fully connected neural network (Dense), a long short-term memory recurrent neural network (LSTM), a convolutional neural network (CNN), and a Bidirectional Encoder Representations from Transformers (BERT)] of deep learning-based NLP. Two datasets consisting of radiologist-annotated reports of both trauma radiographs (n = 2469) and chest radiographs and computer tomography (CT) studies (n = 2255) were split into training sets (80%) and testing sets (20%). The training data was used as a source to train all four model types in 84 experiments (Fracture-data) and 45 experiments (Chest-data) with variation in size and prevalence. The performance was evaluated on sensitivity, specificity, positive predictive value, negative predictive value, area under the curve, and F score. After the NLP of radiology reports, all four model-architectures demonstrated high performance with metrics up to > 0.90. CNN, LSTM, and Dense were outperformed by the BERT algorithm because of its stable results despite variation in training size and prevalence. Awareness of variation in prevalence is warranted because it impacts sensitivity and specificity in opposite directions.

Download Full-text

Application of Domain Ontologies to Natural Language Processing

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2015070102 ◽

2015 ◽

Vol 5 (3) ◽

pp. 19-38 ◽

Cited By ~ 2

Author(s):

María Herrero-Zazo ◽

Isabel Segura-Bedmar ◽

Janna Hastings ◽

Paloma Martínez

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Domain Knowledge ◽

Semantic Representation ◽

Relation Extraction ◽

Biomedical Literature ◽

Entity Recognition ◽

Knowledge Domain ◽

Domain Ontologies

Natural Language Processing (NLP) techniques can provide an interesting way to mine the growing biomedical literature, and a promising approach for new knowledge discovery. However, the major bottleneck in this area is that these systems rely on specific resources providing the domain knowledge. Domain ontologies provide a contextual framework and a semantic representation of the domain, and they can contribute to a better performance of current NLP systems. However, their contribution to information extraction has not been well studied yet. The aim of this paper is to provide insights into the potential role that domain ontologies can play in NLP. To do this, the authors apply the drug-drug interactions ontology (DINTO) to named entity recognition and relation extraction from pharmacological texts. The authors use the DDI corpus, a gold-standard for the development and evaluation of IE systems in this domain, and evaluate their results in the framework of the last SemEval-2013 DDI Extraction task.

Download Full-text

UMLS-based data augmentation for natural language processing of clinical research literature

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa309 ◽

2020 ◽

Author(s):

Tian Kang ◽

Adler Perotte ◽

Youlan Tang ◽

Casey Ta ◽

Chunhua Weng

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Data Augmentation ◽

Training Data ◽

Entity Recognition ◽

Classification Model ◽

Learning Models ◽

Sentence Classification

Abstract Objective The study sought to develop and evaluate a knowledge-based data augmentation method to improve the performance of deep learning models for biomedical natural language processing by overcoming training data scarcity. Materials and Methods We extended the easy data augmentation (EDA) method for biomedical named entity recognition (NER) by incorporating the Unified Medical Language System (UMLS) knowledge and called this method UMLS-EDA. We designed experiments to systematically evaluate the effect of UMLS-EDA on popular deep learning architectures for both NER and classification. We also compared UMLS-EDA to BERT. Results UMLS-EDA enables substantial improvement for NER tasks from the original long short-term memory conditional random fields (LSTM-CRF) model (micro-F1 score: +5%, + 17%, and +15%), helps the LSTM-CRF model (micro-F1 score: 0.66) outperform LSTM-CRF with transfer learning by BERT (0.63), and improves the performance of the state-of-the-art sentence classification model. The largest gain on micro-F1 score is 9%, from 0.75 to 0.84, better than classifiers with BERT pretraining (0.82). Conclusions This study presents a UMLS-based data augmentation method, UMLS-EDA. It is effective at improving deep learning models for both NER and sentence classification, and contributes original insights for designing new, superior deep learning approaches for low-resource biomedical domains.

Download Full-text

An AdaBoost Using a Weak-Learner Generating Several Weak-Hypotheses for Large Training Data of Natural Language Processing

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.130.83 ◽

2010 ◽

Vol 130 (1) ◽

pp. 83-91 ◽

Cited By ~ 1

Author(s):

Tomoya Iwakura ◽

Seishi Okamoto ◽

Kazuo Asakawa

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Training Data ◽

Weak Learner

Download Full-text

Daily estimates of individual discharge likelihood with deep learning natural language processing in general medicine: a prospective and external validation study

Internal and Emergency Medicine ◽

10.1007/s11739-021-02816-7 ◽

2021 ◽

Author(s):

Stephen Bacchi ◽

Toby Gilbert ◽

Samuel Gluck ◽

Joy Cheng ◽

Yiran Tan ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Validation Study ◽

External Validation ◽

General Medicine ◽

External Validation Study

Download Full-text

Deep Learning on Graphs for Natural Language Processing

Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ◽

10.1145/3404835.3462809 ◽

2021 ◽

Author(s):

Lingfei Wu ◽

Yu Chen ◽

Heng Ji ◽

Bang Liu

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Deep Learning Techniques on Text Classification Using Natural Language Processing (NLP) In Social Healthcare Network: A Comprehensive Survey

2021 3rd International Conference on Signal Processing and Communication (ICPSC) ◽

10.1109/icspc51351.2021.9451752 ◽

2021 ◽

Author(s):

PM. Lavanya ◽

E. Sasikala

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Classification ◽

Healthcare Network ◽

Learning Techniques ◽

Comprehensive Survey

Download Full-text

A natural language processing approach based on embedding deep learning from heterogeneous compounds for quantitative structure–activity relationship modeling

Chemical Biology & Drug Design ◽

10.1111/cbdd.13742 ◽

2020 ◽

Vol 96 (3) ◽

pp. 961-972

Author(s):

Khalid Bouhedjar ◽

Abdelbasset Boukelia ◽

Abdelmalek Khorief Nacereddine ◽

Anouar Boucheham ◽

Amine Belaidi ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Quantitative Structure Activity Relationship ◽

Structure Activity Relationship ◽

Activity Relationship ◽

Quantitative Structure ◽

Structure Activity ◽

Processing Approach

Download Full-text

Speech Master: Natural Language Processing and Deep Learning Approach for Automated Speech Evaluation

10.1109/iemcon53756.2021.9623163 ◽

2021 ◽

Author(s):

K.G.C.M Kooragama ◽

L.R.W.D. Jayashanka ◽

J.A. Munasinghe ◽

K.W. Jayawardana ◽

Muditha Tissera ◽

...

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Approach ◽

Speech Evaluation

Download Full-text

Deep Learning Approaches for Spoken and Natural Language Processing

10.1007/978-3-030-79778-2 ◽

2021 ◽

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Learning Approaches

Download Full-text