scholarly journals A Language Model for Misogyny Detection in Latin American Spanish Driven by Multisource Feature Extraction and Transformers

2021 ◽  
Vol 11 (21) ◽  
pp. 10467
Author(s):  
Edwin Aldana-Bobadilla ◽  
Alejandro Molina-Villegas ◽  
Yuridia Montelongo-Padilla ◽  
Ivan Lopez-Arevalo ◽  
Oscar S. Sordia

Creating effective mechanisms to detect misogyny online automatically represents significant scientific and technological challenges. The complexity of recognizing misogyny through computer models lies in the fact that it is a subtle type of violence, it is not always explicitly aggressive, and it can even hide behind seemingly flattering words, jokes, parodies, and other expressions. Currently, it is even difficult to have an exact figure for the rate of misogynistic comments online because, unlike other types of violence, such as physical violence, these events are not registered by any statistical systems. This research contributes to the development of models for the automatic detection of misogynistic texts in Latin American Spanish and contributes to the design of data augmentation methodologies since the amount of data required for deep learning models is considerable.

2017 ◽  
Vol 41 (5) ◽  
pp. 422-428 ◽  
Author(s):  
Alicia Nijdam-Jones ◽  
Diego Rivera ◽  
Barry Rosenfeld ◽  
Juan Carlos Arango-Lasprilla

2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


2020 ◽  
Author(s):  
Clara Vila-Castelar ◽  
Kathryn V. Papp ◽  
Rebecca E. Amariglio ◽  
Valeria L. Torres ◽  
Ana Baena ◽  
...  

Author(s):  
Samuel Leach ◽  
Yunhe Xue ◽  
Rahul Sridhar ◽  
Stephanie Paal ◽  
Zhangyang Wang ◽  
...  

2015 ◽  
Vol 38 (2) ◽  
pp. 276-300 ◽  
Author(s):  
Pedro Mogorrón Huerta

Traditionally, research papers on fixed expressions emphasize the fact that those sequences are fixed compared to constructions with free components. After one study which was carried out in 2010 through which we were able to prove that a considerable number of verbal fixed expressions in common Peninsular Spanish allow changes in some of their components without causing a change in the meaning and maintaining their fixed state, in this paper we analyze verbal fixed expressions in the Latin American Spanish variety. This analysis allows us to observe the modes of variation in the Latin American Spanish verbal fixed expressions (paradigm, lexic, morphology, grammar) by following the same patterns and syntactic structures as in common Penninsular Spanish which we find in the case of diatopic expressions formed in the verbal fixed expressions of common Penninsular Spanish as well as in new diatopic verbal fixed expressions. The fact that there are so many verbal fixed expressions in the Latin American Spanish variety and also that this number will only increase in the near future reinforces the idea that we should create very complete data bases.


Sign in / Sign up

Export Citation Format

Share Document