Domain adaptation for semantic role labeling of clinical text

Yaoyun Zhang; Buzhou Tang; Min Jiang; Jingqi Wang; Hua Xu

doi:10.1093/jamia/ocu048

Domain adaptation for semantic role labeling of clinical text

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocu048 ◽

2015 ◽

Vol 22 (5) ◽

pp. 967-979 ◽

Cited By ~ 13

Author(s):

Yaoyun Zhang ◽

Buzhou Tang ◽

Min Jiang ◽

Jingqi Wang ◽

Hua Xu

Keyword(s):

Domain Adaptation ◽

Biomedical Literature ◽

Semantic Role ◽

Semantic Role Labeling ◽

Clinical Text ◽

Source Domain ◽

Training Samples ◽

Annotation Costs ◽

Feature Augmentation ◽

F Measure

Abstract Objective Semantic role labeling (SRL), which extracts a shallow semantic relation representation from different surface textual forms of free text sentences, is important for understanding natural language. Few studies in SRL have been conducted in the medical domain, primarily due to lack of annotated clinical SRL corpora, which are time-consuming and costly to build. The goal of this study is to investigate domain adaptation techniques for clinical SRL leveraging resources built from newswire and biomedical literature to improve performance and save annotation costs. Materials and Methods Multisource Integrated Platform for Answering Clinical Questions (MiPACQ), a manually annotated SRL clinical corpus, was used as the target domain dataset. PropBank and NomBank from newswire and BioProp from biomedical literature were used as source domain datasets. Three state-of-the-art domain adaptation algorithms were employed: instance pruning, transfer self-training, and feature augmentation. The SRL performance using different domain adaptation algorithms was evaluated by using 10-fold cross-validation on the MiPACQ corpus. Learning curves for the different methods were generated to assess the effect of sample size. Results and Conclusion When all three source domain corpora were used, the feature augmentation algorithm achieved statistically significant higher F-measure (83.18%), compared to the baseline with MiPACQ dataset alone (F-measure, 81.53%), indicating that domain adaptation algorithms may improve SRL performance on clinical text. To achieve a comparable performance to the baseline method that used 90% of MiPACQ training samples, the feature augmentation algorithm required <50% of training samples in MiPACQ, demonstrating that annotation costs of clinical SRL can be reduced significantly by leveraging existing SRL resources from other domains.

Download Full-text

Domain adaptation for semantic role labeling in the biomedical domain

Bioinformatics ◽

10.1093/bioinformatics/btq075 ◽

2010 ◽

Vol 26 (8) ◽

pp. 1098-1104 ◽

Cited By ~ 18

Author(s):

D. Dahlmeier ◽

H. T. Ng

Keyword(s):

Domain Adaptation ◽

Biomedical Domain ◽

Semantic Role ◽

Semantic Role Labeling

Download Full-text

Korean Semantic Role Labeling Using Domain Adaptation Technique

Journal of KIISE ◽

10.5626/jok.2015.42.4.475 ◽

2015 ◽

Vol 42 (4) ◽

pp. 475-482 ◽

Cited By ~ 2

Author(s):

Soojong Lim ◽

Yongjin Bae ◽

Hyunki Kim ◽

Dongyul Ra

Keyword(s):

Domain Adaptation ◽

Semantic Role ◽

Semantic Role Labeling

Download Full-text

DEEP DOMAIN ADAPTATION BY WEIGHTED ENTROPY MINIMIZATION FOR THE CLASSIFICATION OF AERIAL IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-591-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 591-598

Author(s):

D. Wittich

Keyword(s):

Data Augmentation ◽

Domain Adaptation ◽

Aerial Images ◽

Target Domain ◽

Source Domain ◽

Entropy Minimization ◽

Training Samples ◽

Order Of Magnitude ◽

Weighted Entropy

Abstract. Fully convolutional neural networks (FCN) are successfully used for the automated pixel-wise classification of aerial images and possibly additional data. However, they require many labelled training samples to perform well. One approach addressing this issue is semi-supervised domain adaptation (SSDA). Here, labelled training samples from a source domain and unlabelled samples from a target domain are used jointly to obtain a target domain classifier, without requiring any labelled samples from the target domain. In this paper, a two-step approach for SSDA is proposed. The first step corresponds to a supervised training on the source domain, making use of strong data augmentation to increase the initial performance on the target domain. Secondly, the model is adapted by entropy minimization using a novel weighting strategy. The approach is evaluated on the basis of five domains, corresponding to five cities. Several training variants and adaptation scenarios are tested, indicating that proper data augmentation can already improve the initial target domain performance significantly resulting in an average overall accuracy of 77.5%. The weighted entropy minimization improves the overall accuracy on the target domains in 19 out of 20 scenarios on average by 1.8%. In all experiments a novel FCN architecture is used that yields results comparable to those of the best-performing models on the ISPRS labelling challenge while having an order of magnitude fewer parameters than commonly used FCNs.

Download Full-text

Domain Adaptation in Semantic Role Labeling Using a Neural Language Model and Linguistic Resources

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2015.2449072 ◽

2015 ◽

Vol 23 (11) ◽

pp. 1812-1823 ◽

Cited By ~ 3

Author(s):

Quynh Thi Ngoc Do ◽

Steven Bethard ◽

Marie-Francine Moens

Keyword(s):

Domain Adaptation ◽

Language Model ◽

Semantic Role ◽

Semantic Role Labeling ◽

Linguistic Resources

Download Full-text

Domain-Adaptation Technique for Semantic Role Labeling with Structural Learning

ETRI Journal ◽

10.4218/etrij.14.0113.0645 ◽

2014 ◽

Vol 36 (3) ◽

pp. 429-438 ◽

Cited By ~ 2

Author(s):

Soojong Lim ◽

Changki Lee ◽

Pum-Mo Ryu ◽

Hyunki Kim ◽

Sang Kyu Park ◽

...

Keyword(s):

Domain Adaptation ◽

Structural Learning ◽

Semantic Role ◽

Semantic Role Labeling

Download Full-text

A COMPARISON OF TWO STRATEGIES FOR AVOIDING NEGATIVE TRANSFER IN DOMAIN ADAPTATION BASED ON LOGISTIC REGRESSION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-845-2018 ◽

2018 ◽

Vol XLII-2 ◽

pp. 845-852 ◽

Cited By ~ 1

Author(s):

A. Paul ◽

K. Vogt ◽

F. Rottensteiner ◽

J. Ostermann ◽

C. Heipke

Keyword(s):

Negative Transfer ◽

Domain Adaptation ◽

Classification Performance ◽

Target Domain ◽

Maximum Mean Discrepancy ◽

Source Domain ◽

Training Samples ◽

Benchmark Datasets ◽

Consistent Performance ◽

Class Labels

In this paper we deal with the problem of measuring the similarity between training and tests datasets in the context of transfer learning (TL) for image classification. TL tries to transfer knowledge from a source domain, where labelled training samples are abundant but the data may follow a different distribution, to a target domain, where labelled training samples are scarce or even unavailable, assuming that the domains are related. Thus, the requirements w.r.t. the availability of labelled training samples in the target domain are reduced. In particular, if no labelled target data are available, it is inherently difficult to find a robust measure of relatedness between the source and target domains. This is of crucial importance for the performance of TL, because the knowledge transfer between unrelated data may lead to negative transfer, i.e. to a decrease of classification performance after transfer. We address the problem of measuring the relatedness between source and target datasets and investigate three different strategies to predict and, consequently, to avoid negative transfer in this paper. The first strategy is based on circular validation. The second strategy relies on the Maximum Mean Discrepancy (MMD) similarity metric, whereas the third one is an extension of MMD which incorporates the knowledge about the class labels in the source domain. Our method is evaluated using two different benchmark datasets. The experiments highlight the strengths and weaknesses of the investigated methods. We also show that it is possible to reduce the amount of negative transfer using these strategies for a TL method and to generate a consistent performance improvement over the whole dataset.

Download Full-text

Knowledge Acquisition Through Ontologies from Medical Natural Language Texts

Journal of Information Technology Research ◽

10.4018/jitr.2017100104 ◽

2017 ◽

Vol 10 (4) ◽

pp. 56-69 ◽

Cited By ~ 2

Author(s):

José Medina-Moreira ◽

Katty Lagos-Ortiz ◽

Harry Luna-Aveiga ◽

Oscar Apolinario-Arzube ◽

María del Pilar Salas-Zárate ◽

...

Keyword(s):

Life Cycle ◽

General Solution ◽

Natural Language ◽

The Other ◽

Semantic Role ◽

Semantic Role Labeling ◽

Clinical Knowledge ◽

Biomedical Ontologies ◽

Biomedical Field ◽

F Measure

Ontologies are used to represent knowledge and they have become very important in the Semantic Web era. Ontologies evolve continuously during their life cycle to adapt to new requirements and needs, especially in the biomedical field, where the number of ontologies and their complexity have increased during the last years. On the other hand, a vast amount of clinical knowledge resides in natural language texts. For these reasons, building and maintaining biomedical ontologies from natural language texts is a relevant and challenging issue. In order to provide a general solution and to minimize the experts' participation during the ontology enriching process, a methodology for extracting terms and relations from natural language texts is proposed in this work. This framework is based on linguistic and statistical methods and semantic role labeling technologies, having been validated in the domain of diabetes, where they have obtained encouraging results with an F-measure of 82.1% and 79.9% for concepts and relations, respectively.

Download Full-text

Unsupervised domain adaptation without source domain training samples

Proceedings of the Tenth Indian Conference on Computer Vision, Graphics and Image Processing - ICVGIP '16 ◽

10.1145/3009977.3010033 ◽

2016 ◽

Cited By ~ 2

Author(s):

Sudipan Saha ◽

Biplab Banerjee ◽

Shabbir N. Merchant

Keyword(s):

Domain Adaptation ◽

Source Domain ◽

Unsupervised Domain Adaptation ◽

Training Samples

Download Full-text

Knowledge Acquisition Through Ontologies from Medical Natural Language Texts

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch053 ◽

2020 ◽

pp. 1023-1037

Author(s):

José Medina-Moreira ◽

Katty Lagos-Ortiz ◽

Harry Luna-Aveiga ◽

Oscar Apolinario-Arzube ◽

María del Pilar Salas-Zárate ◽

...

Keyword(s):

Life Cycle ◽

General Solution ◽

Semantic Web ◽

Natural Language ◽

Statistical Methods ◽

The Other ◽

Semantic Role ◽

Semantic Role Labeling ◽

Biomedical Field ◽

F Measure

Download Full-text

Extending Korean PropBank for Korean Semantic Role Labeling and Applying Domain Adaptation Technique

Korean Journal of Cognitive Science ◽

10.19066/cogsci.2015.26.4.001 ◽

2015 ◽

Vol 26 (4) ◽

pp. 377-392

Author(s):

배장성 ◽

이창기

Keyword(s):

Domain Adaptation ◽

Semantic Role ◽

Semantic Role Labeling

Download Full-text