AraCust: a Saudi Telecom Tweets corpus for sentiment analysis

PeerJ Computer Science ◽

10.7717/peerj-cs.510 ◽

2021 ◽

Vol 7 ◽

pp. e510

Author(s):

Latifah Almuqren ◽

Alexandra Cristea

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

Exploratory Data Analysis ◽

Simple Experiment ◽

Gold Standard Corpus ◽

Golden Standard ◽

Exploratory Data ◽

Dialectal Arabic ◽

Arabic Sentiment Analysis ◽

The Right

Comparing Arabic to other languages, Arabic lacks large corpora for Natural Language Processing (Assiri, Emam & Al-Dossari, 2018; Gamal et al., 2019). A number of scholars depended on translation from one language to another to construct their corpus (Rushdi-Saleh et al., 2011). This paper presents how we have constructed, cleaned, pre-processed, and annotated our 20,0000 Gold Standard Corpus (GSC) AraCust, the first Telecom GSC for Arabic Sentiment Analysis (ASA) for Dialectal Arabic (DA). AraCust contains Saudi dialect tweets, processed from a self-collected Arabic tweets dataset and has been annotated for sentiment analysis, i.e.,manually labelled (k=0.60). In addition, we have illustrated AraCust’s power, by performing an exploratory data analysis, to analyse the features that were sourced from the nature of our corpus, to assist with choosing the right ASA methods for it. To evaluate our Golden Standard corpus AraCust, we have first applied a simple experiment, using a supervised classifier, to offer benchmark outcomes for forthcoming works. In addition, we have applied the same supervised classifier on a publicly available Arabic dataset created from Twitter, ASTD (Nabil, Aly & Atiya, 2015). The result shows that our dataset AraCust outperforms the ASTD result with 91% accuracy and 89% F1avg score. The AraCust corpus will be released, together with code useful for its exploration, via GitHub as a part of this submission.

Download Full-text

Sentiment Analysis for Exploratory Data Analysis

The Programming Historian ◽

10.46430/phen0079 ◽

2018 ◽

Cited By ~ 1

Author(s):

Zoë Wilkinson Saldaña

Keyword(s):

Natural Language Processing ◽

Data Analysis ◽

Natural Language ◽

Sentiment Analysis ◽

Language Processing ◽

Exploratory Data Analysis ◽

Exploratory Data

In this lesson you will learn to conduct 'sentiment analysis' on texts and to interpret the results. This is a form of exploratory data analysis based on natural language processing. You will learn to install all appropriate software and to build a reusable program that can be applied to your own texts.

Download Full-text

Arabic Sentiment Analysis (ASA) Using Deep Learning Approach

Journal of Engineering ◽

10.31026/j.eng.2020.06.07 ◽

2020 ◽

Vol 26 (6) ◽

pp. 85-93

Author(s):

Abdulhakeem Qusay Al-Bayati ◽

Ahmed S. Al-Araji ◽

Saman Hameed Ameen

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Web Sites ◽

Short Term Memory ◽

Morphological Structure ◽

Arabic Language ◽

Feature Representation ◽

Main Task ◽

Arabic Sentiment Analysis

Sentiment analysis is one of the major fields in natural language processing whose main task is to extract sentiments, opinions, attitudes, and emotions from a subjective text. And for its importance in decision making and in people's trust with reviews on web sites, there are many academic researches to address sentiment analysis problems. Deep Learning (DL) is a powerful Machine Learning (ML) technique that has emerged with its ability of feature representation and differentiating data, leading to state-of-the-art prediction results. In recent years, DL has been widely used in sentiment analysis, however, there is scarce in its implementation in the Arabic language field. Most of the previous researches address other languages like English. The proposed model tackles Arabic Sentiment Analysis (ASA) by using a DL approach. ASA is a challenging field where Arabic language has a rich morphological structure more than other languages. In this work, Long Short-Term Memory (LSTM) as a deep neural network has been used for training the model combined with word embedding as a first hidden layer for features extracting. The results show an accuracy of about 82% is achievable using DL method.

Download Full-text

Tracking the right clues with exploratory data analysis

IEEE Spectrum ◽

10.1109/6.694359 ◽

1998 ◽

Vol 35 (7) ◽

pp. 58-65 ◽

Cited By ~ 3

Author(s):

J.V. James

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Exploratory Data ◽

The Right

Download Full-text

A powerful comparison of deep learning frameworks for Arabic sentiment analysis

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp745-752 ◽

2021 ◽

Vol 11 (1) ◽

pp. 745

Author(s):

Youssra Zahidi ◽

Yacine El Younoussi ◽

Yassine Al-Amrani

Keyword(s):

Deep Learning ◽

Comparative Analysis ◽

Sentiment Analysis ◽

Programming Languages ◽

Language Processing ◽

Large Set ◽

Language Familiarity ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis ◽

Python Programming

Deep learning (DL) is a machine learning (ML) subdomain that involves algorithms taken from the brain function named artificial neural networks (ANNs). Recently, DL approaches have gained major accomplishments across various Arabic natural language processing (ANLP) tasks, especially in the domain of Arabic sentiment analysis (ASA). For working on Arabic SA, researchers can use various DL libraries in their projects, but without justifying their choice or they choose a group of libraries relying on their particular programming language familiarity. We are basing in this work on Java and Python programming languages because they have a large set of deep learning libraries that are very useful in the ASA domain. This paper focuses on a comparative analysis of different valuable Python and Java libraries to conclude the most relevant and robust DL libraries for ASA. Throw this comparative analysis, and we find that: TensorFlow, Theano, and Keras Python frameworks are very popular and very used in this research domain.

Download Full-text

Different valuable tools for Arabic sentiment analysis: a comparative evaluation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp753-762 ◽

2021 ◽

Vol 11 (1) ◽

pp. 753

Author(s):

Youssra Zahidi ◽

Yacine El Younoussi ◽

Yassine Al-Amrani

Keyword(s):

Sentiment Analysis ◽

Programming Languages ◽

Language Processing ◽

Comparative Evaluation ◽

Research Work ◽

Arabic Language ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis ◽

Python Programming ◽

Research Domain

Arabic Natural language processing (ANLP) is a subfield of artificial intelligence (AI) that tries to build various applications in the Arabic language like Arabic sentiment analysis (ASA) that is the operation of classifying the feelings and emotions expressed for defining the attitude of the writer (neutral, negative or positive). In order to work on ASA, researchers can use various tools in their research projects without explaining the cause behind this use, or they choose a set of libraries according to their knowledge about a specific programming language. Because of their libraries' abundance in the ANLP field, especially in ASA, we are relying on JAVA and Python programming languages in our research work. This paper relies on making an in-depth comparative evaluation of different valuable Python and Java libraries to deduce the most useful ones in Arabic sentiment analysis (ASA). According to a large variety of great and influential works in the domain of ASA, we deduce that the NLTK, Gensim and TextBlob libraries are the most useful for Python ASA task. In connection with Java ASA libraries, we conclude that Weka and CoreNLP tools are the most used, and they have great results in this research domain.

Download Full-text

ASA: A framework for Arabic sentiment analysis

Journal of Information Science ◽

10.1177/0165551519849516 ◽

2019 ◽

Vol 46 (4) ◽

pp. 544-559 ◽

Cited By ~ 4

Author(s):

Ahmed Oussous ◽

Fatima-Zahra Benjelloun ◽

Ayoub Ait Lahcen ◽

Samir Belfkih

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Short Term Memory ◽

Research Area ◽

Support Vector ◽

Learning Models ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis

Sentiment analysis (SA), also known as opinion mining, is a growing important research area. Generally, it helps to automatically determine if a text expresses a positive, negative or neutral sentiment. It enables to mine the huge increasing resources of shared opinions such as social networks, review sites and blogs. In fact, SA is used by many fields and for various languages such as English and Arabic. However, since Arabic is a highly inflectional and derivational language, it raises many challenges. In fact, SA of Arabic text should handle such complex morphology. To better handle these challenges, we decided to provide the research community and Arabic users with a new efficient framework for Arabic Sentiment Analysis (ASA). Our primary goal is to improve the performance of ASA by exploiting deep learning while varying the preprocessing techniques. For that, we implement and evaluate two deep learning models namely convolutional neural network (CNN) and long short-term memory (LSTM) models. The framework offers various preprocessing techniques for ASA (including stemming, normalisation, tokenization and stop words). As a result of this work, we first provide a new rich and publicly available Arabic corpus called Moroccan Sentiment Analysis Corpus (MSAC). Second, the proposed framework demonstrates improvement in ASA. In fact, the experimental results prove that deep learning models have a better performance for ASA than classical approaches (support vector machines, naive Bayes classifiers and maximum entropy). They also show the key role of morphological features in Arabic Natural Language Processing (NLP).

Download Full-text

Arabic Sentiment Analysis on Chewing Khat Leaves using Machine Learning and Ensemble Methods

Engineering, Technology & Applied Science Research ◽

10.48084/etasr.4026 ◽

2021 ◽

Vol 11 (2) ◽

pp. 6845-6848

Author(s):

W. M. S. Yafooz ◽

E. A. Hizam ◽

W. A. Alromema

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Social Issues ◽

Ensemble Methods ◽

Support Vector ◽

Khat Chewing ◽

User Comments ◽

Arabic Sentiment Analysis ◽

Processing Techniques

Sentiment analysis plays an important role in obtaining speakers' opinions or feelings towards events, products, topics, or services, helping businesses to improve their products. Moreover, governments and organizations investigate and solve current social issues by analyzing perspectives and feelings. This study evaluated the habit of chewing Khat (qat) leaves among the Yemeni society. Chewing Khat plant leaves, is a common habit in Yemen and East Africa. This paper proposes a model to detect information about the Khat chewing habit, how people explore it, and the preference for Khat leaves among Arabic people. A dataset consisting of user comments on 18 youtube videos was prepared through several natural language processing techniques. Several experiments were conducted using six machine learning classifiers and four ensemble methods. Support Vector Machine and Linear Regression had almost 80% accuracy, whereas xgboot was the most accurate ensemble method reaching 77%.

Download Full-text

Automatic Bayesian Density Analysis

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33015207 ◽

2019 ◽

Vol 33 ◽

pp. 5207-5215 ◽

Cited By ~ 4

Author(s):

Antonio Vergari ◽

Alejandro Molina ◽

Robert Peharz ◽

Zoubin Ghahramani ◽

Kristian Kersting ◽

...

Keyword(s):

Data Analysis ◽

Exploratory Data Analysis ◽

Real World Data ◽

Domain Experts ◽

Dependency Structure ◽

Making Sense ◽

Anomalous Data ◽

Density Analysis ◽

Exploratory Data ◽

The Right

Making sense of a dataset in an automatic and unsupervised fashion is a challenging problem in statistics and AI. Classical approaches for exploratory data analysis are usually not flexible enough to deal with the uncertainty inherent to real-world data: they are often restricted to fixed latent interaction models and homogeneous likelihoods; they are sensitive to missing, corrupt and anomalous data; moreover, their expressiveness generally comes at the price of intractable inference. As a result, supervision from statisticians is usually needed to find the right model for the data. However, since domain experts are not necessarily also experts in statistics, we propose Automatic Bayesian Density Analysis (ABDA) to make exploratory data analysis accessible at large. Specifically, ABDA allows for automatic and efficient missing value estimation, statistical data type and likelihood discovery, anomaly detection and dependency structure mining, on top of providing accurate density estimation. Extensive empirical evidence shows that ABDA is a suitable tool for automatic exploratory analysis of mixed continuous and discrete tabular data.

Download Full-text

A Method of Deep Learning Tackles Sentiment Analysis Problem in Arabic Texts

Iraqi Journal of Computer Communication Control and System Engineering ◽

10.33103/uot.ijccce.20.4.2 ◽

2020 ◽

pp. 9-20

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Short Term Memory ◽

Human Life ◽

Morphological Structure ◽

Arabic Language ◽

Written Text ◽

Hidden Layer ◽

Arabic Sentiment Analysis

Sentiment Analysis (SA) is a field of Natural Language Processing (NLP) whose goal is to extract the emotion, sentiment or more general opinion expressed in a human-written text. Opinions and emotions play a central role in human life. Therefore, there are many academic researches in this field for processing many languages like English However, there is scarce in its implementation with addressing Arabic Sentiment Analysis (ASA). It is a challenging field where Arabic language has a rich morphological structure and there are many other defies more than in other languages. For that, the proposed model tackles ASA by using a Deep Learning approach. In this work, one of word embedding methods, such as a first hidden layer for features extracting from the input dataset and Long Short-Term Memory (LSTM) as a deep neural network, has been used for training. The model combined with Softmax layer is applied to turn numeric outputs from LSTM layer into probabilities to classify the outputs to positive or negative. There are two datasets that are used for training the model separately with each one. The first one is ASTD dataset as a dialectal Arabic type about different tweets from internet, the results with this dataset is compared with another academic work that used the same one. The results from this work outperforms through accuracy about 14.95% and F-score about 15.14% more than what performed in the previous work. The second one is HTL dataset as a modern standard Arabic type about opinions of reviewers on different hotels from several countries. This dataset is bigger in size than the first one to show the size effect on the results of this model. So, the accuracy increased about 11% and F-score about 10.8% more than what performed with the first dataset.

Download Full-text

How Translation Alters Sentiment

Journal of Artificial Intelligence Research ◽

10.1613/jair.4787 ◽

2016 ◽

Vol 55 ◽

pp. 95-130 ◽

Cited By ~ 37

Author(s):

Saif M. Mohammad ◽

Mohammad Salameh ◽

Svetlana Kiritchenko

Keyword(s):

Sentiment Analysis ◽

System A ◽

Sentiment Lexicon ◽

English Translations ◽

Dialectal Arabic ◽

Arabic And English ◽

Analysis System ◽

Arabic Sentiment Analysis ◽

Translation Systems ◽

Language Text

Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English sentiment analysis system on the text, and (b) translate resources such as sentiment labeled corpora and sentiment lexicons from English into the focus language, and use them as additional resources in the focus-language sentiment analysis system. In this paper we systematically examine both options. We use Arabic social media posts as stand-in for the focus language text. We show that sentiment analysis of English translations of Arabic texts produces competitive results, w.r.t. Arabic sentiment analysis. We show that Arabic sentiment analysis systems benefit from the use of automatically translated English sentiment lexicons. We also conduct manual annotation studies to examine why the sentiment of a translation is different from the sentiment of the source word or text. This is especially relevant for building better automatic translation systems. In the process, we create a state-of-the-art Arabic sentiment analysis system, a new dialectal Arabic sentiment lexicon, and the first Arabic-English parallel corpus that is independently annotated for sentiment by Arabic and English speakers.

Download Full-text