Assessing quality in live interlingual subtitling: a new challenge

Linguistica Antverpiensia, New Series – Themes in Translation Studies ◽

10.52034/lanstts.v16i0.454 ◽

2018 ◽

Vol 16 ◽

Author(s):

Isabelle S. Robert ◽

Aline Remael

Keyword(s):

Named Entity Recognition ◽

Quality Parameters ◽

Entity Recognition ◽

Recognition Error ◽

Named Entity ◽

Speech Recognition Software ◽

Starting Point ◽

Different Types ◽

Good Starting Point ◽

Recognition Software

Quality-assessment models for live interlingual subtitling are virtually non-existent. In this study we investigate whether and to what extent existing models from related translation modes, more specifically the Named Entity Recognition (NER) model for intralingual live subtitling, provide a good starting point. Having conducted a survey of the major quality parameters in different forms of subtitling, we proceed to adapt this model. The model measures live intralingual quality on the basis of different types of recognition error by the speech-recognition software, and edition errors by the respeaker, with reference to their impact on the viewer’s comprehension. To test the adapted model we conducted a context-based study comprising the observation of the live interlingual subtitling process of four episodes of Dansdate, broadcast by the Flemish commercial broadcaster VTM in 2015. The process observed involved four “subtitlers”: the respeaker/interpreter, a corrector, a speech-to-text interpreter and a broadcaster, all of whom performed different functions. The data collected allow errors in the final product and in the intermediate stages to be identified: they include when and by whom they were made. The results show that the NER model can be applied to live interlingual subtitling if it is adapted to deal with errors specific to translation proper.

Download Full-text

Evaluating Word Representation Features in Biomedical Named Entity Recognition Tasks

BioMed Research International ◽

10.1155/2014/240403 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6 ◽

Cited By ~ 49

Author(s):

Buzhou Tang ◽

Hongxin Cao ◽

Xiaolong Wang ◽

Qingcai Chen ◽

Hua Xu

Keyword(s):

Machine Learning ◽

Language Processing ◽

Named Entity Recognition ◽

Entity Recognition ◽

Biomedical Domain ◽

Crucial Step ◽

Named Entity ◽

Different Types ◽

Word Representation ◽

Biomedical Named Entity Recognition

Biomedical Named Entity Recognition (BNER), which extracts important entities such as genes and proteins, is a crucial step of natural language processing in the biomedical domain. Various machine learning-based approaches have been applied to BNER tasks and showed good performance. In this paper, we systematically investigated three different types of word representation (WR) features for BNER, including clustering-based representation, distributional representation, and word embeddings. We selected one algorithm from each of the three types of WR features and applied them to the JNLPBA and BioCreAtIvE II BNER tasks. Our results showed that all the three WR algorithms were beneficial to machine learning-based BNER systems. Moreover, combining these different types of WR features further improved BNER performance, indicating that they are complementary to each other. By combining all the three types of WR features, the improvements inF-measure on the BioCreAtIvE II GM and JNLPBA corpora were 3.75% and 1.39%, respectively, when compared with the systems using baseline features. To the best of our knowledge, this is the first study to systematically evaluate the effect of three different types of WR features for BNER tasks.

Download Full-text

Measuring the effect of different types of unsupervised word representations on Medical Named Entity Recognition

International Journal of Medical Informatics ◽

10.1016/j.ijmedinf.2019.05.022 ◽

2019 ◽

Vol 129 ◽

pp. 100-106 ◽

Cited By ~ 1

Author(s):

Arantza Casillas ◽

Nerea Ezeiza ◽

Iakes Goenaga ◽

Alicia Pérez ◽

Xabier Soto

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Named Entity ◽

Different Types

Download Full-text

Named Entity Recognition with Gating Mechanism and Parallel BiLSTM

Journal of Web Engineering ◽

10.13052/jwe1540-9589.20413 ◽

2021 ◽

Author(s):

Yenan Yi ◽

Yijie Bian

Keyword(s):

Named Entity Recognition ◽

Entity Recognition ◽

Weighted Sum ◽

Named Entity ◽

Gating Mechanism ◽

Final Output ◽

Different Types ◽

Character Sequences ◽

The One ◽

Types Of Information

In this paper, we propose a novel neural network for named entity recognition, which is improved in two aspects. On the one hand, our model uses a parallel BiLSTM structure to generate character-level word representations. By inputting character sequences of words into several independent and parallel BiLSTMs, we can obtain word representations from different representation subspaces, because the parameters of these BiLSTMs are randomly initialized. This method can enhance the expression abilities of character-level word representations. On the other hand, we use a two-layer BiLSTM with gating mechanism to model sentences. Since the features extracted by each layer in a multi-layer LSTM from texts contain different types of information, we use the gating mechanism to assign appropriate weights to the outputs of each layer, and take the weighted sum of these outputs as the final output for named entity recognition. Our model only changes the structure, does not need any feature engineering or external knowledge source, which is a complete end-to-end NER model. We used the CoNLL-2003 English and German datasets to evaluate our model and got better results compared with baseline models.

Download Full-text

Named Entity Recognition in Document Summarization

Trends and Applications of Text Summarization Techniques - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-9373-7.ch005 ◽

2020 ◽

pp. 125-149

Author(s):

Sandhya P. ◽

Mahek Laxmikant Kantesaria

Keyword(s):

Information Extraction ◽

Named Entity Recognition ◽

Entity Recognition ◽

Second Step ◽

Original Text ◽

Automatic Summarization ◽

Document Summarization ◽

Named Entity ◽

Text Document ◽

Different Types

Named entity recognition (NER) is a subtask of the information extraction. NER system reads the text and highlights the entities. NER will separate different entities according to the project. NER is the process of two steps. The steps are detection of names and classifications of them. The first step is further divided into the segmentation. The second step will consist to choose an ontology which will organize the things categorically. Document summarization is also called automatic summarization. It is a process in which the text document with the help of software will create a summary by selecting the important points of the original text. In this chapter, the authors explain how document summarization is performed using named entity recognition. They discuss about the different types of summarization techniques. They also discuss about how NER works and its applications. The libraries available for NER-based information extraction are explained. They finally explain how NER is applied into document summarization.

Download Full-text

Information Extraction and Named Entity Recognition Supported Social Media Sentiment Analysis during the COVID-19 Pandemic

Applied Sciences ◽

10.3390/app112211017 ◽

2021 ◽

Vol 11 (22) ◽

pp. 11017

Author(s):

László Nemes ◽

Attila Kiss

Keyword(s):

Social Media ◽

Information Extraction ◽

Sentiment Analysis ◽

Social Networking Sites ◽

Named Entity Recognition ◽

Measurement Model ◽

Entity Recognition ◽

Named Entity ◽

Starting Point ◽

Social Media Platforms

Social media platforms are increasingly being used to communicate information, something which has only intensified during the pandemic. News portals and governments are also increasing attention to digital communications, announcements and response or reaction monitoring. Twitter, as one of the largest social networking sites, which has become even more important in the communication of information during the pandemic, provides space for a lot of different opinions and news, with many discussions as well. In this paper, we look at the sentiments of people and we use tweets to determine how people have related to COVID-19 over a given period of time. These sentiment analyses are augmented with information extraction and named entity recognition to get an even more comprehensive picture. The sentiment analysis is based on the ’Bidirectional encoder representations from transformers’ (BERT) model, which is the basic measurement model for the comparisons. We consider BERT as the baseline and compare the results with the RNN, NLTK and TextBlob sentiment analyses. The RNN results are significantly closer to the benchmark results given by BERT, both models are able to categorize all tweets without a single tweet fall into the neutral category. Then, via a deeper analysis of these results, we can get an even more concise picture of people’s emotional state in the given period of time. The data from these analyses further support the emotional categories, and provide a deeper understanding that can provide a solid starting point for other disciplines as well, such as linguistics or psychology. Thus, the sentiment analysis, supplemented with information extraction and named entity recognition analyses, can provide a supported and deeply explored picture of specific sentiment categories and user attitudes.

Download Full-text

Effective and Efficient Classification of Topically-Enriched Domain-Specific Text Snippets

International Journal of Strategic Decision Sciences ◽

10.4018/ijsds.2015070101 ◽

2015 ◽

Vol 6 (3) ◽

pp. 1-17 ◽

Cited By ~ 3

Author(s):

Marco Spruit ◽

Bas Vlug

Keyword(s):

Production Systems ◽

Named Entity Recognition ◽

Classification Problem ◽

Entity Recognition ◽

Classification Error ◽

Named Entity ◽

Domain Specific ◽

Stop Word ◽

Different Types

Due to the explosive growth in the amount of text snippets over the past few years and their sparsity of text, organizations are unable to effectively and efficiently classify them, missing out on business opportunities. This paper presents TETSC: the Topically-Enriched Text Snippet Classification method. TETSC aims to solve the classification problem for text snippets in any domain. TETSC recognizes that there are different types of text snippets and, therefore, allows for stop word removal, named-entity recognition, and topical enrichment for the different types of text snippets. TETSC has been implemented in the production systems of a personal finance organization, which resulted in a classification error reduction of over 21%. Highlights: The authors create the TETSC method for classifying topically-enriched text snippets; the authors differentiate between different types of text snippets; the authors show a successful application of Named-Entity Recognition to text snippets; using multiple enrichment strategies appears to reduce effectivity.

Download Full-text

A Hybrid Bootstrapping Approach for developing Odiya Named Entity Corpora from Wikipedia

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.38.24311 ◽

2018 ◽

Vol 7 (4.38) ◽

pp. 11

Author(s):

Sitanath Biswas ◽

Sujata Dash

Keyword(s):

Language Processing ◽

Question Answering ◽

Promising Result ◽

Named Entity Recognition ◽

Entity Recognition ◽

Training Dataset ◽

Indian Languages ◽

Named Entity ◽

Proper Nouns ◽

Different Types

Named Entity Recognition (NER) is considered as very influential undertaking in natural language processing appropriate to Question Answering system, Machine Translation (MT), Information extraction (IE), Information Retrieval (IR) etc. Basically NER is to identify and classify different types of proper nouns present inside given file like location name, person name, number, organization name, time etc. Although huge amount of progress is made for different Indian languages, NER is still a big problem for Odiya Language. Odiya is also a resource constrained language and till today, this is very tough to find out a large and accurate corpus for training and test. Therefore in this paper, we have utilized Wikipedia to develop a huge Odiya corpus of annotated name entities which is quite efficient to be training dataset further. After evaluation, we have got a very promising result with a F-score of 78.89.

Download Full-text