scholarly journals Towards Brazilian Portuguese automatic text simplification systems

Author(s):  
Sandra M. Aluísio ◽  
Lucia Specia ◽  
Thiago A.S. Pardo ◽  
Erick G. Maziero ◽  
Renata P.M. Fortes
Author(s):  
Horacio Saggion

Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text simplification concerns both the modification of the vocabulary of the text (lexical simplification) and the modification of the structure of the sentences (syntactic simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe simplification applications and full systems also outline language resources and evaluation approaches.


2020 ◽  
Author(s):  
Tarek Sakakini ◽  
Jong Yoon Lee ◽  
Aditya Duri ◽  
Renato F.L. Azevedo ◽  
Victor Sadauskas ◽  
...  

2021 ◽  
Author(s):  
V. S. Martins ◽  
C. D. Silva

Automatic Text Classification represents a great improvement in law area workflow, mainly in the migration of physical to electronic lawsuits. A systematic review of studies on text classification in law area from January 2017 up to February 2020 was conducted. The search strategy identified 20 studies, that were analyzed and compared. The review investigates from research questions: what are the state-of-art language models, its application of text classification in English and Brazilian Portuguese datasets from legal area, if there are available language models trained on Brazilian Portuguese, and datasets in Brazilian law area. It concludes that there are applications of automatic text classification in Brazil, although there is a gap on the use of language models when compared with English language dataset studies, also the importance of language model in domain pre-training to improve results, as well as there are two studies making available Brazilian Portuguese language models, and one introducing a dataset in Brazilian law area.


2020 ◽  
Vol 30 (02) ◽  
pp. 2050008
Author(s):  
Akihiro Katsuta ◽  
Kazuhide Yamamoto

In recent years, simple Japanese has been attracting attention as information transmission for foreigners. Automatic text simplification aims to reduce the complexity of vocabulary and expressions in a sentence while retaining its original meaning. This paper aims at compressing vocabulary, focusing on lexical simplification. Since the construction or expansion of a simplification corpus is very costly, we construct a simplification model by unsupervised learning that does not require a parallel corpus for simplification. We construct a simplification model that does not require a parallel corpus using Unsupervised Statistical Machine Translation. Based on a predetermined vocabulary, a pseudo-corpus for simplification is constructed from a web corpus and we learn the simplification model by the pseudo-corpus. We only need a vocabulary and a plain text corpus to train the simplification model. Moreover, we propose to clean the phrase table by WordNet, which improves the performance in BLEU and SARI metrics. By suppressing distant paraphrasing with WordNet, it became easier to select the correct paraphrase candidate.


Sign in / Sign up

Export Citation Format

Share Document