NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Vicent Ahuir; Lluís-F. Hurtado; José Ángel González; Encarna Segarra

doi:10.3390/app11219872

NASca and NASes: Two Monolingual Pre-Trained Models for Abstractive Summarization in Catalan and Spanish

Applied Sciences ◽

10.3390/app11219872 ◽

2021 ◽

Vol 11 (21) ◽

pp. 9872

Author(s):

Vicent Ahuir ◽

Lluís-F. Hurtado ◽

José Ángel González ◽

Encarna Segarra

Keyword(s):

English Language ◽

Text Summarization ◽

Spanish Language ◽

Evaluation Metrics ◽

Minority Languages ◽

Abstractive Summarization ◽

Textual Content ◽

Usual Evaluation ◽

Newspaper Articles

Most of the models proposed in the literature for abstractive summarization are generally suitable for the English language but not for other languages. Multilingual models were introduced to address that language constraint, but despite their applicability being broader than that of the monolingual models, their performance is typically lower, especially for minority languages like Catalan. In this paper, we present a monolingual model for abstractive summarization of textual content in the Catalan language. The model is a Transformer encoder-decoder which is pretrained and fine-tuned specifically for the Catalan language using a corpus of newspaper articles. In the pretraining phase, we introduced several self-supervised tasks to specialize the model on the summarization task and to increase the abstractivity of the generated summaries. To study the performance of our proposal in languages with higher resources than Catalan, we replicate the model and the experimentation for the Spanish language. The usual evaluation metrics, not only the most used ROUGE measure but also other more semantic ones such as BertScore, do not allow to correctly evaluate the abstractivity of the generated summaries. In this work, we also present a new metric, called content reordering, to evaluate one of the most common characteristics of abstractive summaries, the rearrangement of the original content. We carried out an exhaustive experimentation to compare the performance of the monolingual models proposed in this work with two of the most widely used multilingual models in text summarization, mBART and mT5. The experimentation results support the quality of our monolingual models, especially considering that the multilingual models were pretrained with many more resources than those used in our models. Likewise, it is shown that the pretraining tasks helped to increase the degree of abstractivity of the generated summaries. To our knowledge, this is the first work that explores a monolingual approach for abstractive summarization both in Catalan and Spanish.

Download Full-text

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Information ◽

10.3390/info11020078 ◽

2020 ◽

Vol 11 (2) ◽

pp. 78 ◽

Cited By ~ 2

Author(s):

Tulu Tilahun Hailu ◽

Junqing Yu ◽

Tessfu Geteye Fantaye

Keyword(s):

Text Summarization ◽

Evaluation Framework ◽

Word Embedding ◽

Evaluation Metrics ◽

Original Text ◽

Automatic Evaluation ◽

Source Text ◽

Automatic Text Summarization ◽

Automatic Text

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

Guillermo del Toro

Latino Studies ◽

10.1093/obo/9780199913701-0146 ◽

2020 ◽

Author(s):

Dolores Tierney

Keyword(s):

United States ◽

New Zealand ◽

Science Fiction ◽

English Language ◽

The United States ◽

Harry Potter ◽

Spanish Language ◽

Alfonso Cuaron ◽

The 1960S

Guillermo del Toro (b. 1964) is an Oscar-winning Mexican director, screenwriter, producer, novelist, film scholar, curator, and nonfiction writer who works internationally on English-language and Spanish-language projects in Mexico, New Zealand, Spain, and the United States and across a number of different media, including film, television, animation, and novels. Although he has worked in multiple genres, including horror (Mimic (1997), Blade II (2002), Crimson Peak (2015)), action/fantasy (Hellboy (2004), Hellboy II: The Golden Army (2008)), science fiction (Pacific Rim (2013)), and hybrids of these and other genres (The Shape of Water (2017)), he is most known for the gothic sensibility of many of his projects (Cronos (1993), The Devil’s Backbone (2001), Pan’s Labyrinth (2006), Crimson Peak (2015)). Relatedly, Del Toro’s Cronos and his subsequent films, including those he has produced have contributed greatly to the rehabilitation of the horror and fantasy genres from the cultural disreputability they suffered through the 1960s to the early 1990s and also facilitated more horror production in Mexico going forward. In addition to the gothic quality of his work, Del Toro’s auteur status is often traced through the recurring imagery, themes, and monsters that appear across his oeuvre and through the recurring preoccupations with the contiguity of real and fantasy worlds and with ghosts as manifestations of the (historical and political) past. Although Del Toro has made and been involved in the production of some notable franchise films in recent years, directing Blade II, Hellboy, and Hellboy II: The Golden Army, receiving a screenwriting credit for The Hobbit: An Unexpected Journey (2012), The Hobbit: The Desolation of Smaug (2013), and The Hobbit: The Battle of the Five Armies (2014) he has also turned down several opportunities to work on franchise films in the Narnia and Harry Potter series (passing on directing Harry Potter and the Prisoner of Azkaban but suggesting his compatriot Alfonso Cuarón for the job instead) and leaving the production of The Hobbit films after work on the scripts. He’s also received writing credit on Trox Nixey’s Don’t Be Afraid of the Dark (2010).

Download Full-text

A Systematic Survey on Multi-document Text Summarization

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/111062021 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3148-3153

Keyword(s):

Deep Learning ◽

Text Summarization ◽

Evaluation Metrics ◽

Automatic Process ◽

Document Summarization ◽

Text Document ◽

Automatic Text Summarization ◽

As Graph ◽

Abstractive Summarization ◽

Automatic Text

Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.

Download Full-text

Food procurement in English-language Canadian public schools: Opportunities and challenges

Canadian Food Studies / La Revue canadienne des études sur l alimentation ◽

10.15353/cfs-rcea.v6i1.265 ◽

2019 ◽

Vol 6 (1) ◽

pp. 75-99

Author(s):

Shawna Holmes

Keyword(s):

Public Schools ◽

Food Environment ◽

Nutritional Quality ◽

English Language ◽

Nutrient Content ◽

Food Environments ◽

School Food ◽

Provincial Level ◽

School Food Environment

This paper examines the changes to procurement for school food environments in Canada as a response to changes to nutrition regulations at the provincial level. Interviews with those working in school food environments across Canada revealed how changes to the nutrition requirements of foods and beverages sold in schools presented opportunities to not only improve the nutrient content of the items made available in school food environments, but also to include local producers and/or school gardens in procuring for the school food environment. At the same time, some schools struggle to procure nutritionally compliant foods due to increased costs associated with transporting produce to rural, remote, or northern communities as well as logistic difficulties like spoilage. Although the nutrition regulations have facilitated improvements to food environments in some schools, others require more support to improve the overall nutritional quality of the foods and beverages available to students at school.

Download Full-text

Productive Vocabulary Knowledge of ESL Learners

Asian Journal of Interdisciplinary Research ◽

10.34256/ajir1814 ◽

2018 ◽

Vol 1 (1) ◽

pp. 32-41 ◽

Cited By ~ 1

Author(s):

Abdulmalik Usman ◽

Dahiru Musa Abdullahi

Keyword(s):

Teaching And Learning ◽

English Language ◽

Vocabulary Knowledge ◽

Writing Quality ◽

Lexical Frequency ◽

Frequency Profile ◽

Esl Learners ◽

The Relationship

The paper seeks to investigate the level of productive knowledge of ESL learners, the writing quality and the relationship between the vocabulary knowledge and the writing quality. 150 final year students of English language in a university in Nigeria were randomly selected as respondents. The respondents were asked to write an essay of 300 words within one hour. The essays were typed into Vocab Profiler of Cobb (2002) and analyzed the Lexical Frequency Profile of the respondents. The essays were also assessed by independent examiners using a standard rubric. The findings reveal that the level of productive vocabulary knowledge of the respondents is limited. The writing quality of the majority of the respondent is fair and there is a significant correlation between vocabulary and the witting quality of the subjects. The researchers posit that productive vocabulary is the predictor of writing quality and recommend various techniques through which teaching and learning of vocabulary can be improved.

Download Full-text

“The Past, Present, and Future of Accounting History”: A Comment on the State of Accounting History

Accounting Historians Journal ◽

10.2308/aahj-2020-002 ◽

2020 ◽

Vol 47 (1) ◽

pp. 89-95 ◽

Cited By ~ 1

Author(s):

Garry D. Carnegie

Keyword(s):

English Language ◽

Research Field ◽

The State ◽

International Accounting ◽

Scholarly Research ◽

Recent Contribution ◽

Global Society ◽

Accounting History ◽

The Past

ABSTRACT This response to the recent contribution by Matthews (2019) entitled “The Past, Present, and Future of Accounting History” specifically deals with the issues associated with concentrating on counting publication numbers in examining the state of a scholarly research field at the start of the 2020s. It outlines several pitfalls with the narrowly focused publications count analysis, in selected English language journals only, as provided by Matthews. The commentary is based on three key arguments: (1) accounting history research and publication is far more than a “numbers game”; (2) trends in the quality of the research undertaken and published are paramount; and (3) international publication and accumulated knowledge in accounting history are indeed more than a collection of English language publications. The author seeks to contribute to discussion and debate between accounting historians and other researchers for the benefit and development of the international accounting history community and global society.

Download Full-text

Medical and surgical interventions to improve the quality of life for endometriosis patients: a systematic review

Gynecological Surgery ◽

10.1186/s10397-021-01096-5 ◽

2021 ◽

Vol 18 (1) ◽

Author(s):

Maurizio Nicola D’Alterio ◽

Stefania Saponara ◽

Mirian Agus ◽

Antonio Simone Laganà ◽

Marco Noventa ◽

...

Keyword(s):

Quality Of Life ◽

English Language ◽

Short Form ◽

Work And Family ◽

Surgical Interventions ◽

Medline Search ◽

Accessible Information ◽

Before And After ◽

Surgical Treatments

AbstractEndometriosis impairs the quality of life (QoL) of many women, including their social relationships, daily activity, productivity at work, and family planning. The aim of this review was to determine the instruments used to examine QoL in previous clinical studies of endometriosis and to evaluate the effect of medical and surgical interventions for endometriosis on QoL. We conducted a systematic search and review of studies published between January 2010 and December 2020 using MEDLINE. Search terms included “endometriosis” and “quality of life.” We only selected studies that used a standardized questionnaire to evaluate QoL before and after medical or surgical interventions. Only articles in the English language were examined. The initial search identified 720 results. After excluding duplicates and applying inclusion criteria, 37 studies were selected for analysis. We found that the two scales most frequently used to measure QoL were the Short Form-36 health survey questionnaire (SF-36) and the Endometriosis Health Profile-30 (EHP-30). Many medical and surgical treatments demonstrated comparable benefits in pain control and QoL improvement. There is no clear answer as to what is the best treatment for improving QoL because each therapy must be personalized for the patient and depends on the woman’s goals. In conclusion, women must be informed about endometriosis and given easily accessible information to improve treatment adherence and their QoL.

Download Full-text

Producing ‘good enough’ automated transcripts securely: Extending Bokhove and Downey (2018) to address security concerns

Methodological Innovations ◽

10.1177/2059799120987766 ◽

2021 ◽

Vol 14 (1) ◽

pp. 205979912098776

Author(s):

Joseph Da Silva

Keyword(s):

Cyber Security ◽

English Language ◽

Negative Impact ◽

Early Career ◽

Third Party ◽

Doctoral Research ◽

Computing Services ◽

Cloud Computing Services ◽

Audio Data

Interviews are an established research method across multiple disciplines. Such interviews are typically transcribed orthographically in order to facilitate analysis. Many novice qualitative researchers’ experiences of manual transcription are that it is tedious and time-consuming, although it is generally accepted within much of the literature that quality of analysis is improved through researchers performing this task themselves. This is despite the potential for the exhausting nature of bulk transcription to conversely have a negative impact upon quality. Other researchers have explored the use of automated methods to ease the task of transcription, more recently using cloud-computing services, but such services present challenges to ensuring confidentiality and privacy of data. In the field of cyber-security, these are particularly concerning; however, any researcher dealing with confidential participant speech should also be uneasy with third-party access to such data. As a result, researchers, particularly early-career researchers and students, may find themselves with no option other than manual transcription. This article presents a secure and effective alternative, building on prior work published in this journal, to present a method that significantly reduced, by more than half, interview transcription time for the researcher yet maintained security of audio data. It presents a comparison between this method and a fully manual method, drawing on data from 10 interviews conducted as part of my doctoral research. The method presented requires an investment in specific equipment which currently only supports the English language.

Download Full-text

SAT0371 ARE ENGLISH-LANGUAGE VIDEOS ON YOUTUBE A USEFUL SOURCE OF INFORMATION FOR SPONDYLOARTHRITIS?

Annals of the Rheumatic Diseases ◽

10.1136/annrheumdis-2020-eular.3109 ◽

2020 ◽

Vol 79 (Suppl 1) ◽

pp. 1133.1-1133

Author(s):

S. Elangovan ◽

Y. H. Kwan ◽

W. Fong

Keyword(s):

English Language ◽

Healthcare Professionals ◽

Treatment Options ◽

Misleading Information ◽

Information Group ◽

Youtube Videos ◽

Group 2 ◽

Patient Opinion ◽

Group 1

Background:Spondyloarthritis (SpA) is a family of chronic inflammatory disorders. Social media, such as YouTube, is a popular online platform where patients often visit for information. However, the validity of the content uploaded onto YouTube is not known.Objectives:This study aimed to evaluate the content, reliability and quality of the most viewed English-language YouTube videos on SpA.Methods:Keywords “spondyloarthritis”, “spondyloarthropathy” and “ankylosing spondylitis” were searched on YouTube on October 7th, 2019. The top 270 videos were screened. Videos were excluded if they were irrelevant, in non-English language or if they had no audio. Total number of views, duration on YouTube (days), video length, upload date, number of likes, dislikes, subscribers and comments were recorded for videos. A modified 5-point DISCERN tool1and the 5-point Global Quality Scale (GQS) score2were used to assess the reliability and quality of the videos, with higher scores indicating greater reliability and quality respectively.Results:Two hundred of 270 videos were included in the final analysis [61.5% from healthcare professionals, 37.0% from patients, 1.5% from news channels]. Of the 200 videos, 15 were uploaded within the last year and 112 in the last five years. 120 (60%) were categorized as useful information (Group 1), 6 (3%) as misleading information (Group 2), 52 (26%) as useful patient opinion (Group 3) and 22 (11%) as misleading patient opinion (Group 4). Useful videos were mainly from healthcare professionals or patients (86%). Useful videos (Group 1 and 3) had higher median (IQR) number of subscribers [2700 (14700) vs 211 (457), p < 0.01], reliability scores [3 (1) vs 2 (1), p < 0.01] and GQS scores [3 (1) vs. 2 (1), p < 0.001] compared to misleading videos (Group 2 and 4), respectively.Videos uploaded by healthcare professionals tended to have more useful information [94% (116 of 123) vs. 66% (49 of 74), p < 0.001] and had higher median (IQR) reliability scores [3 (1) vs 2 (1), p < 0.001] and GQS scores [3 (2) vs 2 (1), p < 0.001] compared to patient uploaded videos respectively. Of the 5 (out of 123) videos from healthcare professionals that had misleading information, it was because of outdated information on diagnosis (3 videos) and treatment (5 videos) of SpA. Of the 22 videos that had misleading patient opinion, 9 (41%) wrongly described the clinical features for SpA and 14 (64%) portrayed the current evidence based treatment options as ineffective and described alternative treatment plans (i.e. diet restrictions, complementary and alternative medicine).Conclusion:The majority of English language YouTube videos have useful information on the topic of SpA, however, 31% of patient opinions have inaccurate information on the clinical features and treatment options, and viewers need to be cognisant of these “fake news”.References:[1]Charnock D, Shepperd S, Needham G, Gann R (1999) DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. J Epidemiol Community Health 53(2): 105-111[2]Bernard A, Langille M, Hughes S, Rose C, Leddin D, Veldhuyzen van Zanten S (2007) A systematic review of patient inflammatory bowel disease information resources on the World Wide Web. Am J Gastroenterol 102(9):2070-2077Disclosure of Interests:Sakktivel Elangovan: None declared, Yu Heng Kwan: None declared, Warren Fong Consultant of: Abbvie, Janssen, Novartis, Speakers bureau: Abbvie, Janssen, Novartis

Download Full-text

¿Comprenderán Mis Amigos y La Familia? Analyzing Spanish Translations of Admission Materials for Latina/o Students Applying to 4-Year Institutions in the United States

Journal of Hispanic Higher Education ◽

10.1177/1538192718775478 ◽

2018 ◽

Vol 19 (2) ◽

pp. 195-209 ◽

Cited By ~ 4

Author(s):

Zachary W. Taylor

Keyword(s):

English Language ◽

The United States ◽

Reading Level ◽

Spanish Language ◽

First Year ◽

Linguistic Capital ◽

Undergraduate Admissions ◽

Research And Practice ◽

Grade Reading Level ◽

Language Content

This study examines first-year undergraduate admissions materials from 325 bachelor-degree granting U.S. institutions, closely analyzing the English-language readability and Spanish-language readability and translation of these materials. Via Yosso’s linguistic capital, the results reveal 4.9% of first-year undergraduate admissions materials had been translated into Spanish, 4% of institutional admissions websites embed translation widgets, and the average readability of English-language content is above the 13th-grade reading level. Implications for research and practice are discussed.

Download Full-text