Named entity recognition of local adverse drug reactions in Xinjiang based on transfer learning

2021 ◽  
Vol 40 (5) ◽  
pp. 8899-8914
Author(s):  
Keming Kang ◽  
Shengwei Tian ◽  
Long Yu

For deep learning’s insufficient learning ability of a small amount of data in the Chinese named entity recognition based on deep learning, this paper proposes a named entity recognition of local adverse drug reactions based on Adversarial Transfer Learning, and constructs a neural network model ASAIBC consisting of Adversarial Transfer Learning, Self-Attention, independently recurrent neural network (IndRNN), Bi-directional long short-term memory (BiLSTM) and conditional random field (CRF). However, of the task of Chinese named entity recognition (NER), there are only few open labeled data sets. Therefore, this article introduces Adversarial Transfer Learning network to fully utilize the boundary of Chinese word segmentation tasks (CWS) and NER tasks for information sharing. Plus, the specific information in the CWS is also filtered. Combing with Self-Attention mechanism and IndRNN, this feature’s expression ability is enhanced, thus allowing the model to concern the important information of different entities from different levels. Along with better capture of the dependence relations of long sentences, the recognition ability of the model is further strengthened. As all the results gained from WeiBoNER and MSRA data sets by ASAIBC model are better than traditional algorithms, this paper conducts an experiment on the data set of Xinjiang local named entity recognition of adverse drug reactions (XJADRNER) based on manual labeling, with the accuracy, precision, recall and F-Score value being 98.97%, 91.01%, 90.21% and 90.57% respectively. These experimental results have shown that ASAIBC model can significantly improve the NER performance of local adverse drug reactions in Xinjiang.

2020 ◽  
Vol 10 (12) ◽  
pp. 4234 ◽  
Author(s):  
Hung-Kai Kung ◽  
Chun-Mo Hsieh ◽  
Cheng-Yu Ho ◽  
Yun-Cheng Tsai ◽  
Hao-Yung Chan ◽  
...  

This research aims to build a Mandarin named entity recognition (NER) module using transfer learning to facilitate damage information gathering and analysis in disaster management. The hybrid NER approach proposed in this research includes three modules: (1) data augmentation, which constructs a concise data set for disaster management; (2) reference model, which utilizes the bidirectional long short-term memory–conditional random field framework to implement NER; and (3) the augmented model built by integrating the first two modules via cross-domain transfer with disparate label sets. Through the combination of established rules and learned sentence patterns, the hybrid approach performs well in NER tasks for disaster management and recognizes unfamiliar words successfully. This research applied the proposed NER module to disaster management. In the application, we favorably handled the NER tasks of our related work and achieved our desired outcomes. Through proper transfer, the results of this work can be extended to other fields and consequently bring valuable advantages in diverse applications.


2021 ◽  
Author(s):  
Lisa Langnickel ◽  
Juliane Fluck

Intense research has been done in the area of biomedical natural language processing. Since the breakthrough of transfer learning-based methods, BERT models are used in a variety of biomedical and clinical applications. For the available data sets, these models show excellent results - partly exceeding the inter-annotator agreements. However, biomedical named entity recognition applied on COVID-19 preprints shows a performance drop compared to the results on available test data. The question arises how well trained models are able to predict on completely new data, i.e. to generalize. Based on the example of disease named entity recognition, we investigate the robustness of different machine learning-based methods - thereof transfer learning - and show that current state-of-the-art methods work well for a given training and the corresponding test set but experience a significant lack of generalization when applying to new data. We therefore argue that there is a need for larger annotated data sets for training and testing.


Processes ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 1178
Author(s):  
Zhenhua Wang ◽  
Beike Zhang ◽  
Dong Gao

In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.


2021 ◽  
pp. 1-13
Author(s):  
Xia Li ◽  
Qinghua Wen ◽  
Zengtao Jiao ◽  
Jiangtao Zhang

Abstract The China Conference on Knowledge Graph and Semantic Computing (CCKS) 2020 Evaluation Task 3 presented clinical named entity recognition and event extraction for the Chinese electronic medical records. Two annotated data sets and some other additional resources for these two subtasks were provided for participators. This evaluation competition attracted 354 teams and 46 of them successfully submitted the valid results. The pre-trained language models are widely applied in this evaluation task. Data argumentation and external resources are also helpful.


2016 ◽  
Vol 2016 ◽  
pp. 1-9 ◽  
Author(s):  
Abbas Akkasi ◽  
Ekrem Varoğlu ◽  
Nazife Dimililer

Named Entity Recognition (NER) from text constitutes the first step in many text mining applications. The most important preliminary step for NER systems using machine learning approaches is tokenization where raw text is segmented into tokens. This study proposes an enhanced rule based tokenizer, ChemTok, which utilizes rules extracted mainly from the train data set. The main novelty of ChemTok is the use of the extracted rules in order to merge the tokens split in the previous steps, thus producing longer and more discriminative tokens. ChemTok is compared to the tokenization methods utilized by ChemSpot and tmChem. Support Vector Machines and Conditional Random Fields are employed as the learning algorithms. The experimental results show that the classifiers trained on the output of ChemTok outperforms all classifiers trained on the output of the other two tokenizers in terms of classification performance, and the number of incorrectly segmented entities.


Sign in / Sign up

Export Citation Format

Share Document