Towards Brazilian Portuguese automatic text simplification systems

Over the past decades, information has been made available to a broad audience thanks to the availability of texts on the Web. However, understanding the wealth of information contained in texts can pose difficulties for a number of people including those with poor literacy, cognitive or linguistic impairment, or those with limited knowledge of the language of the text. Text simplification was initially conceived as a technology to simplify sentences so that they would be easier to process by natural-language processing components such as parsers. However, nowadays automatic text simplification is conceived as a technology to transform a text into an equivalent which is easier to read and to understand by a target user. Text simplification concerns both the modification of the vocabulary of the text (lexical simplification) and the modification of the structure of the sentences (syntactic simplification). In this chapter, after briefly introducing the topic of text readability, we give an overview of past and recent methods to address these two problems. We also describe simplification applications and full systems also outline language resources and evaluation approaches.

Download Full-text

When text readability meets automatic text simplification

ITL - International Journal of Applied Linguistics ◽

10.1075/itl.165.2.00int ◽

2014 ◽

Vol 165 (2) ◽

pp. 89-96 ◽

Cited By ~ 1

Author(s):

Thomas François ◽

Delphine Bernhard

Keyword(s):

Text Simplification ◽

Text Readability ◽

Automatic Text

Download Full-text

The Interface Between Readability and Automatic Text Simplification

10.18653/v1/w18-7001 ◽

2018 ◽

Author(s):

Thomas François

Keyword(s):

Text Simplification ◽

Automatic Text

Download Full-text

Improving Machine Translation of English Relative Clauses with Automatic Text Simplification

10.18653/v1/w18-7006 ◽

2018 ◽

Cited By ~ 1

Author(s):

Sanja Štajner ◽

Maja Popović

Keyword(s):

Machine Translation ◽

Relative Clauses ◽

Text Simplification ◽

Automatic Text ◽

English Relative Clauses

Download Full-text

Automatic Text Simplification for Social Good: Progress and Challenges

10.18653/v1/2021.findings-acl.233 ◽

2021 ◽

Author(s):

Sanja Stajner

Keyword(s):

Social Good ◽

Text Simplification ◽

Automatic Text

Download Full-text

Context-Aware Automatic Text Simplification of Health Materials in Low-Resource Domains

10.18653/v1/2020.louhi-1.13 ◽

2020 ◽

Author(s):

Tarek Sakakini ◽

Jong Yoon Lee ◽

Aditya Duri ◽

Renato F.L. Azevedo ◽

Victor Sadauskas ◽

...

Keyword(s):

Context Aware ◽

Low Resource ◽

Text Simplification ◽

Automatic Text

Download Full-text

Automatic Text Simplification in Spanish: A Comparative Evaluation of Complementing Modules

Computational Linguistics and Intelligent Text Processing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-37256-8_40 ◽

2013 ◽

pp. 488-500 ◽

Cited By ~ 5

Author(s):

Biljana Drndarević ◽

Sanja Štajner ◽

Stefan Bott ◽

Susana Bautista ◽

Horacio Saggion

Keyword(s):

Comparative Evaluation ◽

Text Simplification ◽

Automatic Text

Download Full-text

Text Classification in Law Area: a Systematic Review

10.5753/kdmile.2021.17458 ◽

2021 ◽

Author(s):

V. S. Martins ◽

C. D. Silva

Keyword(s):

Systematic Review ◽

Text Classification ◽

English Language ◽

Search Strategy ◽

Language Model ◽

Brazilian Portuguese ◽

Language Models ◽

Automatic Text Classification ◽

Research Questions ◽

Automatic Text

Automatic Text Classification represents a great improvement in law area workflow, mainly in the migration of physical to electronic lawsuits. A systematic review of studies on text classification in law area from January 2017 up to February 2020 was conducted. The search strategy identified 20 studies, that were analyzed and compared. The review investigates from research questions: what are the state-of-art language models, its application of text classification in English and Brazilian Portuguese datasets from legal area, if there are available language models trained on Brazilian Portuguese, and datasets in Brazilian law area. It concludes that there are applications of automatic text classification in Brazil, although there is a gap on the use of language models when compared with English language dataset studies, also the importance of language model in domain pre-training to improve results, as well as there are two studies making available Brazilian Portuguese language models, and one introducing a dataset in Brazilian law area.

Download Full-text

Lexical Simplification by Unsupervised Machine Translation

International Journal of Asian Language Processing ◽

10.1142/s2717554520500083 ◽

2020 ◽

Vol 30 (02) ◽

pp. 2050008

Author(s):

Akihiro Katsuta ◽

Kazuhide Yamamoto

Keyword(s):

Information Transmission ◽

Unsupervised Learning ◽

Machine Translation ◽

Statistical Machine Translation ◽

Text Corpus ◽

Parallel Corpus ◽

Plain Text ◽

Original Meaning ◽

Text Simplification ◽

Automatic Text

In recent years, simple Japanese has been attracting attention as information transmission for foreigners. Automatic text simplification aims to reduce the complexity of vocabulary and expressions in a sentence while retaining its original meaning. This paper aims at compressing vocabulary, focusing on lexical simplification. Since the construction or expansion of a simplification corpus is very costly, we construct a simplification model by unsupervised learning that does not require a parallel corpus for simplification. We construct a simplification model that does not require a parallel corpus using Unsupervised Statistical Machine Translation. Based on a predetermined vocabulary, a pseudo-corpus for simplification is constructed from a web corpus and we learn the simplification model by the pseudo-corpus. We only need a vocabulary and a plain text corpus to train the simplification model. Moreover, we propose to clean the phrase table by WordNet, which improves the performance in BLEU and SARI metrics. By suppressing distant paraphrasing with WordNet, it became easier to select the correct paraphrase candidate.

Download Full-text