Nested Named Entity Recognition via Second-best Sequence Learning and Decoding

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00334 ◽

2020 ◽

Vol 8 ◽

pp. 605-620 ◽

Cited By ~ 1

Author(s):

Takashi Shibuya ◽

Eduard Hovy

Keyword(s):

Random Field ◽

Sequence Learning ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Neural Model ◽

Entity Recognition ◽

Named Entities ◽

Second Best ◽

Named Entity ◽

Better Than

When an entity name contains other names within it, the identification of all combinations of names can become difficult and expensive. We propose a new method to recognize not only outermost named entities but also inner nested ones. We design an objective function for training a neural model that treats the tag sequence for nested entities as the second best path within the span of their parent entity. In addition, we provide the decoding method for inference that extracts entities iteratively from outermost ones to inner ones in an outside-to-inside way. Our method has no additional hyperparameters to the conditional random field based model widely used for flat named entity recognition tasks. Experiments demonstrate that our method performs better than or at least as well as existing methods capable of handling nested entities, achieving F1-scores of 85.82%, 84.34%, and 77.36% on ACE-2004, ACE-2005, and GENIA datasets, respectively.

Download Full-text

A Neural N-Gram-Based Classifier for Chinese Clinical Named Entity Recognition

Applied Sciences ◽

10.3390/app11188682 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8682

Author(s):

Ching-Sheng Lin ◽

Jung-Sing Jwo ◽

Cheng-Hsiung Lee

Keyword(s):

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Neural Model ◽

Entity Recognition ◽

Chinese Word Segmentation ◽

Named Entities ◽

Named Entity ◽

Biomedical Systems ◽

N Gram

Clinical Named Entity Recognition (CNER) focuses on locating named entities in electronic medical records (EMRs) and the obtained results play an important role in the development of intelligent biomedical systems. In addition to the research in alphabetic languages, the study of non-alphabetic languages has attracted considerable attention as well. In this paper, a neural model is proposed to address the extraction of entities from EMRs written in Chinese. To avoid erroneous noise being caused by the Chinese word segmentation, we employ the character embeddings as the only feature without extra resources. In our model, concatenated n-gram character embeddings are used to represent the context semantics. The self-attention mechanism is then applied to model long-range dependencies of embeddings. The concatenation of the new representations obtained by the attention module is taken as the input to bidirectional long short-term memory (BiLSTM), followed by a conditional random field (CRF) layer to extract entities. The empirical study is conducted on the CCKS-2017 Shared Task 2 dataset to evaluate our method and the experimental results show that our model outperforms other approaches.

Download Full-text

Using Conditional Random Field in Named Entity Recognition for Crime Location Identification

International Journal of Mechanical Engineering and Robotics Research ◽

10.18178/ijmerr.9.2.252-257 ◽

2020 ◽

pp. 252-257

Author(s):

Quintin Jackson Goraseb ◽

◽

Nathar Shah

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Location Identification

Download Full-text

Studying the impact of various features on the performance of Conditional Random Field-based Arabic Named Entity Recognition

2013 ACS International Conference on Computer Systems and Applications (AICCSA) ◽

10.1109/aiccsa.2013.6616423 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alia Morsi ◽

Ahmed Rafea

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

The Impact

Download Full-text

Chinese named entity recognition using modified conditional random field on postal address

2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei.2017.8302311 ◽

2017 ◽

Cited By ~ 1

Author(s):

Wenqiao Sun

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Postal Address ◽

Named Entity

Download Full-text

Named entity recognition for Chinese construction documents based on conditional random field

Frontiers of Engineering Management ◽

10.1007/s42524-021-0179-8 ◽

2022 ◽

Author(s):

Qiqi Zhang ◽

Cong Xue ◽

Xing Su ◽

Peng Zhou ◽

Xiangyu Wang ◽

...

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Construction Documents

Download Full-text

Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features

Information ◽

10.3390/info11020082 ◽

2020 ◽

Vol 11 (2) ◽

pp. 82

Author(s):

SaiKiranmai Gorla ◽

Lalita Bhanu Murthy Neti ◽

Aruna Malapati

Keyword(s):

Language Processing ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Limited Resources ◽

Support Vector ◽

Named Entity ◽

Word Level ◽

Asian Languages ◽

Better Than

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.

Download Full-text

Named Entity Recognition in Text Documents Using a Modified Conditional Random Field

Advances in Intelligent Systems and Computing - Recent Findings in Intelligent Computing Techniques ◽

10.1007/978-981-10-8633-5_4 ◽

2018 ◽

pp. 31-41

Author(s):

G. Veena ◽

Deepa Gupta ◽

S. Lakshmi ◽

Jeenu T. Jacob

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Text Documents ◽

Named Entity

Download Full-text

Thai Named-Entity Recognition Using Variational Long Short-Term Memory with Conditional Random Field

Advances in Intelligent Systems and Computing - Advances in Intelligent Informatics, Smart Technology and Natural Language Processing ◽

10.1007/978-3-319-94703-7_8 ◽

2018 ◽

pp. 82-92

Author(s):

Can Udomcharoenchaikit ◽

Peerapon Vateekul ◽

Prachya Boonkwan

Keyword(s):

Random Field ◽

Short Term Memory ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Short Term ◽

Term Memory ◽

Named Entity ◽

Long Short Term Memory

Download Full-text

A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi

Linguistic Issues in Language Technology ◽

10.33011/lilt.v2i.1203 ◽

2009 ◽

Vol 2 ◽

Author(s):

Asif Ekbal ◽

Sivaji Bandyopadhyay

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Contextual Information ◽

Named Entity Recognition ◽

Standard Test ◽

Entity Recognition ◽

Indian Languages ◽

Named Entity ◽

Test Sets ◽

Validation Tests

This paper describes the development of Named Entity Recognition (NER) systems for two leading Indian languages, namely Bengali and Hindi, using the Conditional Random Field (CRF) framework. The system makes use of different types of contextual information along with a variety of features that are helpful in predicting the different named entity (NE) classes. This set of features includes language independent as well as language dependent components. We have used the annotated corpora of 122,467 tokens for Bengali and 502,974 tokens for Hindi tagged with a tag set of twelve different NE classes, defined as part of the IJCNLP-08 NER Shared Task for South and South East Asian Languages (SSEAL). We have considered only the tags that denote person names, location names, organization names, number expressions, time expressions and measurement expressions. A number of experiments have been carried out in order to find out the most suitable features for NER in Bengali and Hindi. The system has been tested with the gold standard test sets of 35K for Bengali and 50K tokens for Hindi. Evaluation results in overall f-score values of 81.15% for Bengali and 78.29% for Hindi for the test sets. 10-fold cross validation tests yield f-score values of 83.89% for Bengali and 80.93% for Hindi. ANOVA analysis is performed to show that the performance improvement due to the use of language dependent features is statistically significant.

Download Full-text

WNUT 2020 Shared Task-1: Conditional Random Field(CRF) based Named Entity Recognition(NER) for Wet Lab Protocols

10.18653/v1/2020.wnut-1.37 ◽

2020 ◽

Author(s):

Kaushik Acharya

Keyword(s):

Random Field ◽

Conditional Random Field ◽

Named Entity Recognition ◽

Entity Recognition ◽

Shared Task ◽

Named Entity ◽

Wet Lab

Download Full-text