Hate Speech Classification in Social Media Using Emotional Analysis

Combating the challenges of social media hate speech in a polarized society

Data Technologies and Applications ◽

10.1108/dta-01-2019-0007 ◽

2019 ◽

Vol 53 (4) ◽

pp. 501-527

Author(s):

Collins Udanor ◽

Chinatu C. Anyanwu

Keyword(s):

Social Media ◽

Free Speech ◽

Hate Speech ◽

Local Context ◽

Content Type ◽

The Social ◽

Significant Difference ◽

Word Clouds ◽

Social Media Platforms ◽

Speech Classification

Purpose Hate speech in recent times has become a troubling development. It has different meanings to different people in different cultures. The anonymity and ubiquity of the social media provides a breeding ground for hate speech and makes combating it seems like a lost battle. However, what may constitute a hate speech in a cultural or religious neutral society may not be perceived as such in a polarized multi-cultural and multi-religious society like Nigeria. Defining hate speech, therefore, may be contextual. Hate speech in Nigeria may be perceived along ethnic, religious and political boundaries. The purpose of this paper is to check for the presence of hate speech in social media platforms like Twitter, and to what degree is hate speech permissible, if available? It also intends to find out what monitoring mechanisms the social media platforms like Facebook and Twitter have put in place to combat hate speech. Lexalytics is a term coined by the authors from the words lexical analytics for the purpose of opinion mining unstructured texts like tweets. Design/methodology/approach This research developed a Python software called polarized opinions sentiment analyzer (POSA), adopting an ego social network analytics technique in which an individual’s behavior is mined and described. POSA uses a customized Python N-Gram dictionary of local context-based terms that may be considered as hate terms. It then applied the Twitter API to stream tweets from popular and trending Nigerian Twitter handles in politics, ethnicity, religion, social activism, racism, etc., and filtered the tweets against the custom dictionary using unsupervised classification of the texts as either positive or negative sentiments. The outcome is visualized using tables, pie charts and word clouds. A similar implementation was also carried out using R-Studio codes and both results are compared and a t-test was applied to determine if there was a significant difference in the results. The research methodology can be classified as both qualitative and quantitative. Qualitative in terms of data classification, and quantitative in terms of being able to identify the results as either negative or positive from the computation of text to vector. Findings The findings from two sets of experiments on POSA and R are as follows: in the first experiment, the POSA software found that the Twitter handles analyzed contained between 33 and 55 percent hate contents, while the R results show hate contents ranging from 38 to 62 percent. Performing a t-test on both positive and negative scores for both POSA and R-studio, results reveal p-values of 0.389 and 0.289, respectively, on an α value of 0.05, implying that there is no significant difference in the results from POSA and R. During the second experiment performed on 11 local handles with 1,207 tweets, the authors deduce as follows: that the percentage of hate contents classified by POSA is 40 percent, while the percentage of hate contents classified by R is 51 percent. That the accuracy of hate speech classification predicted by POSA is 87 percent, while free speech is 86 percent. And the accuracy of hate speech classification predicted by R is 65 percent, while free speech is 74 percent. This study reveals that neither Twitter nor Facebook has an automated monitoring system for hate speech, and no benchmark is set to decide the level of hate contents allowed in a text. The monitoring is rather done by humans whose assessment is usually subjective and sometimes inconsistent. Research limitations/implications This study establishes the fact that hate speech is on the increase on social media. It also shows that hate mongers can actually be pinned down, with the contents of their messages. The POSA system can be used as a plug-in by Twitter to detect and stop hate speech on its platform. The study was limited to public Twitter handles only. N-grams are effective features for word-sense disambiguation, but when using N-grams, the feature vector could take on enormous proportions and in turn increasing sparsity of the feature vectors. Practical implications The findings of this study show that if urgent measures are not taken to combat hate speech there could be dare consequences, especially in highly polarized societies that are always heated up along religious and ethnic sentiments. On daily basis tempers are flaring in the social media over comments made by participants. This study has also demonstrated that it is possible to implement a technology that can track and terminate hate speech in a micro-blog like Twitter. This can also be extended to other social media platforms. Social implications This study will help to promote a more positive society, ensuring the social media is positively utilized to the benefit of mankind. Originality/value The findings can be used by social media companies to monitor user behaviors, and pin hate crimes to specific persons. Governments and law enforcement bodies can also use the POSA application to track down hate peddlers.

Download Full-text

Hate Speech Classification in Indonesian Language Tweets by Using Convolutional Neural Network

Journal of ICT Research and Applications ◽

10.5614/itbj.ict.res.appl.2021.14.3.2 ◽

2021 ◽

Vol 14 (3) ◽

pp. 225-239

Author(s):

Dewa Ayu Nadia Taradhita ◽

I Ketut Gede Darma Putra

Keyword(s):

Neural Network ◽

Social Media ◽

Convolutional Neural Network ◽

Hate Speech ◽

Rapid Development ◽

Term Weighting ◽

Weighting Method ◽

Testing Stage ◽

Testing Accuracy ◽

Speech Classification

The rapid development of social media, added with the freedom of social media users to express their opinions, has influenced the spread of hate speech aimed at certain groups. Online based hate speech can be identified by the used of derogatory words in social media posts. Various studies on hate speech classification have been done, however, very few researches have been conducted on hate speech classification in the Indonesian language. This paper proposes a convolutional neural network method for classifying hate speech in tweets in the Indonesian language. Datasets for both the training and testing stages were collected from Twitter. The collected tweets were categorized into hate speech and non-hate speech. We used TF-IDF as the term weighting method for feature extraction. The most optimal training accuracy and validation accuracy gained were 90.85% and 88.34% at 45 epochs. For the testing stage, experiments were conducted with different amounts of testing data. The highest testing accuracy was 82.5%, achieved by the dataset with 50 tweets in each category.

Download Full-text

Study of Undersampling Method: Instance Hardness Threshold with Various Estimators for Hate Speech Classification

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.42152 ◽

2018 ◽

Vol 2 (2) ◽

Cited By ~ 3

Author(s):

Naufal Azmi Verdikha ◽

Teguh Bharata Adji ◽

Adhistya Erna Permanasari

Keyword(s):

Social Media ◽

Hate Speech ◽

Imbalanced Data ◽

Poor Performance ◽

Training Data ◽

Weighting Method ◽

Imbalanced Data Classification ◽

Data Problem ◽

Speech Classification ◽

Instance Hardness

A text classification system is needed to address the problem of hate speech in social media. However, texts of hate speech are very hard to find in social media. This will make the distribution of training data to be unbalanced (imbalanced data). Classification with imbalanced data will make a poor performance. There are several methods to solve the problem of classification with imbalanced data. One of them is undersampling with Instance Hardness Threshold (IHT) method. IHT method balances the dataset by eliminating data that are frequently misclassified. To find those data, IHT requires an estimator, which is a classifier. This research aims to compare estimators of IHT method to solve imbalanced data problem in hate speech classification using TF-IDF weighting method. This research uses the class ratio of dataset after undersampling, time of the undersampling process, and Index of Balanced Accuracy (IBA) evaluation to determine the best IHT method. The results of this research show that IHT method using the Logistic Regression (IHT(LR)) has the fastest undersampling process (1.91 s), perfectly balance dataset with the class ratio is 1:1, and has the best of IBA evaluation in all estimation process. This result makes IHT(LR) be the best method to solve the imbalanced data problem in hate speech classification.

Download Full-text

Lifelong Learning of Hate Speech Classification on Social Media

10.18653/v1/2021.naacl-main.183 ◽

2021 ◽

Author(s):

Jing Qian ◽

Hong Wang ◽

Mai ElSherief ◽

Xifeng Yan

Keyword(s):

Social Media ◽

Lifelong Learning ◽

Hate Speech ◽

Speech Classification

Download Full-text

Un-Compromised Credibility: Social Media based Multi-Class Hate Speech Classification for Text

IEEE Access ◽

10.1109/access.2021.3101977 ◽

2021 ◽

pp. 1-1

Author(s):

Khubaib Ahmed Qureshi ◽

Muhammad Sabih

Keyword(s):

Social Media ◽

Hate Speech ◽

Speech Classification

Download Full-text

Ensemble-based Semi-Supervised Learning for Hate Speech Detection

The International FLAIRS Conference Proceedings ◽

10.32473/flairs.v34i1.128427 ◽

2021 ◽

Vol 34 (1) ◽

Author(s):

Safa Alsafari

Keyword(s):

Social Media ◽

Supervised Learning ◽

Hate Speech ◽

Classification Performance ◽

Media Content ◽

Learning Approach ◽

Classification Methods ◽

Speech Detection ◽

Speech Classification

Large and accurately labeled textual corpora are vital to developing efficient hate speech classifiers. This paper introduces an ensemble-based semi-supervised learning approach to leverage the availability of abundant social media content. Starting with a reliable hate speech dataset, we train and test diverse classifiers that are then used to label a corpus of one million tweets. Next, we investigate several strategies to select the most confident labels from the obtained pseudo labels. We assess these strategies by re-training all the classifiers with the seed dataset augmented with the trusted pseudo-labeled data. Finally, we demonstrate that our approach improves classification performance over supervised hate speech classification methods.

Download Full-text

Pembelajaran Integrasi tentang Etika Penggunaan Media Sosial dalam Materi Ajar Pendidikan Agama Islam di Sekolah Menangah Atas

Tarbiyatuna : Kajian Pendidikan Islam ◽

10.29062/tarbiyatuna.v3i1.201 ◽

2019 ◽

Vol 3 (1) ◽

pp. 72

Author(s):

Irfan Afandi

Keyword(s):

Social Media ◽

Industrial Revolution ◽

Hate Speech ◽

Islamic Education ◽

Learning Models ◽

Negative Effects ◽

Education Material ◽

Use Of Social Media ◽

The Waves ◽

Critical Reason

The humanitarian problem in the development of the industrial revolution 4.0 is very complex and is at the stage of worrying. No human being separated from the effect of the waves. High school is active users (user) of the results of the industrial revolution the 4.0. The problem that arises in the use of social media including the demise of expertise, the dissemination of hate speech and fabricated news. Teaching Islamic education material should be able to respond to this by providing normative information in the Qur'an and Hadith so that students can escape from its negative effects. One of the solutions offered was to integrate these materials with integratsi learning models in the themes that have been arranged in the school's learning policy. Integrating this material must through the phases between the awarding phase of learning, information or materials to grow a critical reason, generate hypotheses and generalities.

Download Full-text

Progressive domain adaptation for detecting hate speech on social media with small training set and its application to COVID-19 concerned posts

Social Network Analysis and Mining ◽

10.1007/s13278-021-00780-w ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Md Abul Bashar ◽

Richi Nayak ◽

Khanh Luong ◽

Thirunavukarasu Balasubramaniam

Keyword(s):

Social Media ◽

Hate Speech ◽

Domain Adaptation ◽

Training Set

Download Full-text

Minimal Effects, Maximum Panic: Social Media and Democracy in Latin America

Social Media + Society ◽

10.1177/2056305120984452 ◽

2020 ◽

Vol 6 (4) ◽

pp. 205630512098445

Author(s):

Eugenia Mitchelstein ◽

Mora Matassi ◽

Pablo J. Boczkowski

Keyword(s):

Social Media ◽

Latin America ◽

Social Engagement ◽

Hate Speech ◽

De Novo ◽

Electronic Government ◽

Qualitative Assessment ◽

Common Denominator ◽

Negative Effects ◽

Positive Effects

In face of public discourses about the negative effects that social media might have on democracy in Latin America, this article provides a qualitative assessment of existing scholarship about the uses, actors, and effects of platforms for democratic life. Our findings suggest that, first, campaigning, collective action, and electronic government are the main political uses of platforms. Second, politicians and office holders, social movements, news producers, and citizens are the main actors who utilize them for political purposes. Third, there are two main positive effects of these platforms for the democratic process—enabling social engagement and information diffusion—and two main negative ones—the presence of disinformation, and the spread of extremism and hate speech. A common denominator across positive and negative effects is that platforms appear to have minimal effects that amplify pre-existing patterns rather than create them de novo.

Download Full-text

A Review of Light Gradient Boosting Machine Method for Hate Speech Classification on Twitter

2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE) ◽

10.1109/icecie50279.2020.9309565 ◽

2020 ◽

Author(s):

Muhammad Hafizh Abdurrahman ◽

Budhi Irawan ◽

Casi Setianingsih

Keyword(s):

Hate Speech ◽

Gradient Boosting ◽

Machine Method ◽

Light Gradient ◽

Gradient Boosting Machine ◽

Speech Classification

Download Full-text