Semi-supervised learning for named entity recognition using weakly labeled training data

External Knowledge-Based Weakly Supervised Learning Approach on Chinese Clinical Named Entity Recognition

Semantic Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-030-41407-8_22 ◽

2020 ◽

pp. 336-352

Author(s):

Yeheng Duan ◽

Long-Long Ma ◽

Xianpei Han ◽

Le Sun ◽

Bin Dong ◽

...

Keyword(s):

Supervised Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Weakly Supervised Learning ◽

External Knowledge ◽

Named Entity ◽

Knowledge Based ◽

Weakly Supervised

Download Full-text

Learning Task-Specific Representation for Novel Words in Sequence Labeling

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/715 ◽

2019 ◽

Author(s):

Minlong Peng ◽

Qi Zhang ◽

Xiaoyu Xing ◽

Tao Gui ◽

Jinlan Fu ◽

...

Keyword(s):

Empirical Studies ◽

Named Entity Recognition ◽

Learning Task ◽

Training Data ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Sequence Labeling ◽

Part Of Speech ◽

Word Representation

Word representation is a key component in neural-network-based sequence labeling systems. However, representations of unseen or rare words trained on the end task are usually poor for appreciable performance. This is commonly referred to as the out-of-vocabulary (OOV) problem. In this work, we address the OOV problem in sequence labeling using only training data of the task. To this end, we propose a novel method to predict representations for OOV words from their surface-forms (e.g., character sequence) and contexts. The method is specifically designed to avoid the error propagation problem suffered by existing approaches in the same paradigm. To evaluate its effectiveness, we performed extensive empirical studies on four part-of-speech tagging (POS) tasks and four named entity recognition (NER) tasks. Experimental results show that the proposed method can achieve better or competitive performance on the OOV problem compared with existing state-of-the-art methods.

Download Full-text

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

10.21203/rs.3.rs-90025/v1 ◽

2020 ◽

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Vinay Reddy ◽

Sakthivel Rajendran ◽

Imran Razzak ◽

...

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Future Research ◽

Named Entity ◽

Domain Specific ◽

Context Dependent ◽

Biomedical Named Entity Recognition

Abstract Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) - bioALBERT - an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89 (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) Species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Download Full-text

A Hybrid Approach of Pattern Extraction and Semi-supervised Learning for Vietnamese Named Entity Recognition

Computational Collective Intelligence. Technologies and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-642-34630-9_9 ◽

2012 ◽

pp. 83-93 ◽

Cited By ~ 1

Author(s):

Duc-Thuan Vo ◽

Cheol-Young Ock

Keyword(s):

Supervised Learning ◽

Hybrid Approach ◽

Named Entity Recognition ◽

Entity Recognition ◽

Pattern Extraction ◽

Named Entity

Download Full-text

Named entity chunking techniques in supervised learning for Japanese named entity recognition

10.3115/992730.992748 ◽

2000 ◽

Cited By ~ 7

Author(s):

Manabu Sassano ◽

Takehito Utsuro

Keyword(s):

Supervised Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity

Download Full-text

Creating Training Data for Scientific Named Entity Recognition with Minimal Human Effort

Lecture Notes in Computer Science - Computational Science – ICCS 2019 ◽

10.1007/978-3-030-22734-0_29 ◽

2019 ◽

pp. 398-411 ◽

Cited By ~ 2

Author(s):

Roselyne B. Tchoua ◽

Aswathy Ajith ◽

Zhi Hong ◽

Logan T. Ward ◽

Kyle Chard ◽

...

Keyword(s):

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Named Entity ◽

Human Effort

Download Full-text

Named Entity Recognition for Vietnamese Documents Using Semi-supervised Learning Method of CRFs with Generalized Expectation Criteria

2012 International Conference on Asian Language Processing ◽

10.1109/ialp.2012.54 ◽

2012 ◽

Author(s):

Thi-Ngan Pham ◽

Le Minh Nguyen ◽

Quang-Thuy Ha

Keyword(s):

Supervised Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Method ◽

Named Entity ◽

Generalized Expectation

Download Full-text

Named entity recognition: a semi-supervised learning approach

International Journal of Information Technology ◽

10.1007/s41870-020-00470-4 ◽

2020 ◽

Author(s):

H. Sintayehu ◽

G. S. Lehal

Keyword(s):

Supervised Learning ◽

Named Entity Recognition ◽

Entity Recognition ◽

Learning Approach ◽

Named Entity

Download Full-text

Named Entity Recognition for a Low Resource Language

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2085.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 587-590

Keyword(s):

Machine Learning ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Linguistic Knowledge ◽

Rule Based ◽

Low Resource ◽

Named Entity ◽

The North ◽

Rule Based Approach

Kokborok named entity recognition using the rules based approach is being studied in this paper. Named entity recognition is one of the applications of natural language processing. It is considered a subtask for information extraction. Named entity recognition is the means of identifying the named entity for some specific task. We have studied the named entity recognition system for the Kokborok language. Kokborok is the official language of the state of Tripura situated in the north eastern part of India. It is also widely spoken in other part of the north eastern state of India and adjoining areas of Bangladesh. The named entities are like the name of person, organization, location etc. Named entity recognitions are studied using the machine learning approach, rule based approach or the hybrid approach combining the machine learning and rule based approaches. Rule based named entity recognitions are influence by the linguistic knowledge of the language. Machine learning approach requires a large number of training data. Kokborok being a low resource language has very limited number of training data. The rule based approach requires linguistic rules and the results are not depended on the size of data available. We have framed a heuristic rules for identifying the named entity based on linguistic knowledge of the language. An encouraging result is obtained after we test our data with the rule based approach. We also tried to study and frame the rules for the counting system in Kokborok in this paper. The rule based approach to named entity recognition is found suitable for low resource language with limited digital work and absence of named entity tagged data. We have framed a suitable algorithm using the rules for solving the named entity recognition task for obtaining a desirable result.

Download Full-text

Named Entity Recognition (NER) for Tibetan and Mongolian Newspapers

10.33774/coe-2021-xhw9l-v2 ◽

2021 ◽

Author(s):

Robert Barnett ◽

Christian Faggionato ◽

Marieke Meelen ◽

Sargai Yunshaab ◽

Tsering Samdrup ◽

...

Keyword(s):

Gold Standard ◽

Named Entity Recognition ◽

The State ◽

Training Data ◽

Entity Recognition ◽

People's Republic Of China ◽

Republic Of China ◽

Named Entity ◽

Policy Analysts ◽

Standard Training

Modern Tibetan and Vertical (Traditional) Mongolian are scripts used by c.11m people, mostly within the People’s Republic of China. In terms of publicly available tools for NLP, these languages and their scripts are extremely low-resourced and under-researched. We set out firstly to survey the state of NLP for these languages, and secondly to facilitate research by historians and policy analysts working on Tibetan newspapers. Their primary need is to be able to carry out Named Entity Recognition (NER) in Modern Tibetan, a script which has no word or sentence boundaries and for which no segmenters have been developed. Working on LightTag, an online tagger using character-based modelling, we were able to produce gold-standard training data for NER for use with Modern Tibetan.

Download Full-text