LVBERT: Transformer-Based Model for Latvian Language Understanding

Frontiers in Artificial Intelligence and Applications - Human Language Technologies – The Baltic Perspective ◽

10.3233/faia200610 ◽

2020 ◽

Author(s):

Artūrs Znotiņš ◽

Guntis Barzdiņš

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Future Research ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

This paper presents LVBERT – the first publicly available monolingual language model pre-trained for Latvian. We show that LVBERT improves the state-of-the-art for three Latvian NLP tasks including Part-of-Speech tagging, Named Entity Recognition and Universal Dependency parsing. We release LVBERT to facilitate future research and downstream applications for Latvian NLP.

Download Full-text

PhoNLP: A joint multi-task learning model for Vietnamese part-of-speech tagging, named entity recognition and dependency parsing

10.18653/v1/2021.naacl-demos.1 ◽

2021 ◽

Author(s):

Linh The Nguyen ◽

Dat Quoc Nguyen

Keyword(s):

Named Entity Recognition ◽

Learning Model ◽

Entity Recognition ◽

Dependency Parsing ◽

Named Entity ◽

Part Of Speech Tagging ◽

Task Learning ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

BioALBERT: A Simple and Effective Pre-trained Language Model for Biomedical Named Entity Recognition

10.21203/rs.3.rs-90025/v1 ◽

2020 ◽

Author(s):

Usman Naseem ◽

Matloob Khushi ◽

Vinay Reddy ◽

Sakthivel Rajendran ◽

Imran Razzak ◽

...

Keyword(s):

State Of The Art ◽

Language Model ◽

Named Entity Recognition ◽

Training Data ◽

Entity Recognition ◽

Future Research ◽

Named Entity ◽

Domain Specific ◽

Context Dependent ◽

Biomedical Named Entity Recognition

Abstract Background: In recent years, with the growing amount of biomedical documents, coupled with advancement in natural language processing algorithms, the research on biomedical named entity recognition (BioNER) has increased exponentially. However, BioNER research is challenging as NER in the biomedical domain are: (i) often restricted due to limited amount of training data, (ii) an entity can refer to multiple types and concepts depending on its context and, (iii) heavy reliance on acronyms that are sub-domain specific. Existing BioNER approaches often neglect these issues and directly adopt the state-of-the-art (SOTA) models trained in general corpora which often yields unsatisfactory results. Results: We propose biomedical ALBERT (A Lite Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) - bioALBERT - an effective domain-specific pre-trained language model trained on huge biomedical corpus designed to capture biomedical context-dependent NER. We adopted self-supervised loss function used in ALBERT that targets on modelling inter-sentence coherence to better learn context-dependent representations and incorporated parameter reduction strategies to minimise memory usage and enhance the training time in BioNER. In our experiments, BioALBERT outperformed comparative SOTA BioNER models on eight biomedical NER benchmark datasets with four different entity types. The performance is increased for; (i) disease type corpora by 7.47% (NCBI-disease) and 10.63% (BC5CDR-disease); (ii) drug-chem type corpora by 4.61% (BC5CDR-Chem) and 3.89 (BC4CHEMD); (iii) gene-protein type corpora by 12.25% (BC2GM) and 6.42% (JNLPBA); and (iv) Species type corpora by 6.19% (LINNAEUS) and 23.71% (Species-800) is observed which leads to a state-of-the-art results. Conclusions: The performance of proposed model on four different biomedical entity types shows that our model is robust and generalizable in recognizing biomedical entities in text. We trained four different variants of BioALBERT models which are available for the research community to be used in future research.

Download Full-text

Joint Part-of-Speech Tagging and Named Entity Recognition Using Factor Graphs

Text, Speech and Dialogue - Lecture Notes in Computer Science ◽

10.1007/978-3-642-32790-2_28 ◽

2012 ◽

pp. 232-239 ◽

Cited By ~ 1

Author(s):

György Móra ◽

Veronika Vincze

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Factor Graphs ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Named entity recognition based on a Hidden Markov Model in part-of-speech tagging

2008 First International Conference on the Applications of Digital Information and Web Technologies (ICADIWT) ◽

10.1109/icadiwt.2008.4664380 ◽

2008 ◽

Cited By ~ 4

Author(s):

Ryohei Ageishi ◽

Takao Miura

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Named entity recognition in texts with the help of part of speech tagging

Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics ◽

10.17721/1812-5409.2018/4.11 ◽

2018 ◽

pp. 74-83

Author(s):

M. Bevza

Keyword(s):

State Of The Art ◽

Named Entity Recognition ◽

Recognition Task ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Pos Tagging ◽

Part Of Speech ◽

Recent Developments ◽

Future Work

We analyze neural network architectures that yield state of the art results on named entity recognition task and propose a number of new architectures for improving results even further. We have analyzed a number of ideas and approaches that researchers have used to achieve state of the art results in a variety of NLP tasks. In this work, we present a few architectures which we consider to be most likely to improve the existing state of the art solutions for named entity recognition task and part of speech tasks. The architectures are inspired by recent developments in multi-task learning. This work tests the hypothesis that NER and POS are related tasks and adding information about POS tags as input to the network can help achieve better NER results. And vice versa, information about NER tags can help solve the task of POS tagging. This work also contains the implementation of the network and results of the experiments together with the conclusions and future work.

Download Full-text

Part-of-speech Tagging and Named Entity Recognition Using Improved Hidden Markov Model and Bloom Filter

2018 International Conference on Computing, Power and Communication Technologies (GUCON) ◽

10.1109/gucon.2018.8674901 ◽

2018 ◽

Cited By ~ 1

Author(s):

Ankita ◽

K. A. Abdul Nazeer

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Named Entity Recognition ◽

Bloom Filter ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Part of Speech Tagging and Named Entity Recognition

Text Analysis with R - Quantitative Methods in the Humanities and Social Sciences ◽

10.1007/978-3-030-39643-5_18 ◽

2020 ◽

pp. 237-245

Author(s):

Matthew L. Jockers ◽

Rosamond Thalken

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Correcting Word Segmentation and Part-of-Speech Tagging Errors for Chinese Named Entity Recognition

The Internet Challenge: Technology and Applications ◽

10.1007/978-94-010-0494-7_4 ◽

2002 ◽

pp. 29-36 ◽

Cited By ~ 1

Author(s):

Tianfang Yao ◽

Wei Ding ◽

Gregor Erbach

Keyword(s):

Named Entity Recognition ◽

Word Segmentation ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Speech Tagging

Download Full-text

Bodo Resources for NLP - An Overview of Existing Primary Resources for Bodo

Proceedings of Intelligent Computing and Technologies Conference ◽

10.21467/proceedings.115.12 ◽

2021 ◽

Author(s):

Mwnthai Narzary ◽

Gwmsrang Muchahary ◽

Maharaj Brahma ◽

Sanjib Narzary ◽

Pranav Kumar Singh ◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Parallel Corpus ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Challenges And Opportunities ◽

Speech Tagging

With over 1.4 million Bodo speakers, there is a need for Automated Language Processing systems such as Machine translation, Part Of Speech tagging, Speech recognition, Named Entity Recognition, and so on. In order to develop such a system it requires a sufficient amount of dataset. In this paper we present a detailed description of the primary resources available for Bodo language that can be used as datasets to study Natural Language Processing and its applications. We have listed out different resources available for Bodo language: 8,005 Lexicon dataset collected from agriculture and health, Raw corpus dataset of 2,915,544 words, Tagged corpus consisting of 30,000 sentences, Parallel corpus of 28,359 sentences from tourism, agriculture and health and Tagged and Parallel corpus dataset of 37,768 sentences. We further discuss the challenges and opportunities present in Bodo language.

Download Full-text

Learning Tag Dependencies for Sequence Tagging

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/637 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yuan Zhang ◽

Hongshen Chen ◽

Yihong Zhao ◽

Qun Liu ◽

Dawei Yin

Keyword(s):

Language Processing ◽

Channel Model ◽

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Proposed Model ◽

Speech Tagging

Sequence tagging is the basis for multiple applications in natural language processing. Despite successes in learning long term token sequence dependencies with neural network, tag dependencies are rarely considered previously. Sequence tagging actually possesses complex dependencies and interactions among the input tokens and the output tags. We propose a novel multi-channel model, which handles different ranges of token-tag dependencies and their interactions simultaneously. A tag LSTM is augmented to manage the output tag dependencies and word-tag interactions, while three mechanisms are presented to efficiently incorporate token context representation and tag dependency. Extensive experiments on part-of-speech tagging and named entity recognition tasks show that the proposed model outperforms the BiLSTM-CRF baseline by effectively incorporating the tag dependency feature.

Download Full-text