A New Technique for Sentiment Analysis System Based on Deep Learning Using Chi-Square Feature Selection Methods

Performance Assessment of Multiple Classifiers Based on Ensemble Feature Selection Scheme for Sentiment Analysis

Applied Computational Intelligence and Soft Computing ◽

10.1155/2018/8909357 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 4

Author(s):

Monalisa Ghosh ◽

Goutam Sanyal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Gini Index ◽

Feature Vector ◽

Information Gain ◽

Feature Subset ◽

Selection Methods ◽

Prominent Feature ◽

Chi Square

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM, KNN, and ME). The proposed methods are evaluated on the basis of three standard datasets, namely, IMDb movie review and electronics and kitchen product review dataset. Initially, unigram and bigram features are extracted by applying n-gram method. In addition, we generate a composite features vector CompUniBi (unigram + bigram), which is sent to the feature selection methods Information Gain (IG), Gini Index (GI), and Chi-square (CHI) to get an optimal feature subset by assigning a score to each of the features. These methods offer a ranking to the features depending on their score; thus a prominent feature vector (CompIG, CompGI, and CompCHI) can be generated easily for classification. Finally, the machine learning classifiers SVM, MNB, KNN, and ME used prominent feature vector for classifying the review document into either positive or negative. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the composite feature vector achieved a better performance than unigram feature, which is encouraging as well as comparable to the related research. The best results were obtained from the combination of Information Gain with SVM in terms of highest accuracy.

Download Full-text

Sentiment Analysis of Movie Reviews: A Study of Machine Learning Algorithms with Various Feature Selection Methods

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v5i9.113121 ◽

2017 ◽

Vol 5 (9) ◽

Cited By ~ 1

Author(s):

Rajwinder Kaur

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Selection Methods

Download Full-text

Feature Selection Methods in Sentiment Analysis

Proceedings of the 3rd International Conference on Networking, Information Systems & Security ◽

10.1145/3386723.3387840 ◽

2020 ◽

Author(s):

Nurilhami Izzatie Khairi ◽

Azlinah Mohamed ◽

Nor Nadiah Yusof

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Selection Methods

Download Full-text

Informative Gene Selection and Direct Classification of Tumor Based on Chi-Square Test of Pairwise Gene Interactions

BioMed Research International ◽

10.1155/2014/589290 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Hongyan Zhang ◽

Lanzhi Li ◽

Chao Luo ◽

Congwei Sun ◽

Yuan Chen ◽

...

Keyword(s):

Feature Selection ◽

Gene Selection ◽

Gene Interactions ◽

Selection Methods ◽

Chi Square ◽

Generalization Performance ◽

Independent Test ◽

Chi Square Test ◽

Leave One Out ◽

Tumor Gene

In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor-gene selection, the Chi-square test-based integrated rank gene and direct classifier (χ2-IRG-DC). First, we obtained the weighted integrated rank of gene importance from chi-square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave-one-out cross-validation of the chi-square test-based Direct Classifier (χ2-DC) within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above withχ2-DC. Furthermore, we analyzed the robustness ofχ2-IRG-DC by comparing the generalization performance of different models, the efficiency of different feature-selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene-expression datasets showed thatχ2-IRG-DC could efficiently control overfitting and had higher generalization performance. The informative genes selected byχ2-IRG-DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance inχ2-DC.

Download Full-text

Ordinal-based and frequency-based integration of feature selection methods for sentiment analysis

Expert Systems with Applications ◽

10.1016/j.eswa.2017.01.009 ◽

2017 ◽

Vol 75 ◽

pp. 80-93 ◽

Cited By ~ 23

Author(s):

Alireza Yousefpour ◽

Roliana Ibrahim ◽

Haza Nuzly Abdel Hamed

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Selection Methods

Download Full-text

Big Data Analytics and Deep Learning Based Sentiment Analysis System for Sales Prediction

2019 IEEE Pune Section International Conference (PuneCon) ◽

10.1109/punecon46936.2019.9105719 ◽

2019 ◽

Author(s):

Aamod Khatiwada ◽

Pradeep Kadariya ◽

Sandip Agrahari ◽

Rabin Dhakal

Keyword(s):

Big Data ◽

Deep Learning ◽

Sentiment Analysis ◽

Data Analytics ◽

Big Data Analytics ◽

Sales Prediction ◽

Analysis System

Download Full-text

Integrated Feature Selection Methods Using Metaheuristic Algorithms for Sentiment Analysis

Intelligent Information and Database Systems - Lecture Notes in Computer Science ◽

10.1007/978-3-662-49381-6_13 ◽

2016 ◽

pp. 129-140 ◽

Cited By ~ 1

Author(s):

Alireza Yousefpour ◽

Roliana Ibrahim ◽

Haza Nuzly Abdul Hamed ◽

Takeru Yokoi

Keyword(s):

Feature Selection ◽

Sentiment Analysis ◽

Metaheuristic Algorithms ◽

Selection Methods

Download Full-text

To use or not to use: Feature selection for sentiment analysis of highly imbalanced data

Natural Language Engineering ◽

10.1017/s1351324917000298 ◽

2017 ◽

Vol 24 (1) ◽

pp. 3-37 ◽

Cited By ~ 5

Author(s):

SANDRA KÜBLER ◽

CAN LIU ◽

ZEESHAN ALI SAYYED

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Sentiment Analysis ◽

Information Gain ◽

Binary Classification ◽

Small Subset ◽

Large Set ◽

Learning Approaches ◽

Selection Methods ◽

Data Set

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.

Download Full-text

Support Vector Machine Berbasis Feature Selection Untuk Sentiment Analysis Kepuasan Pelanggan Terhadap Pelayanan Warung dan Restoran Kuliner Kota Tegal

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201855867 ◽

2018 ◽

Vol 5 (5) ◽

pp. 537 ◽

Cited By ~ 1

Author(s):

Oman Somantri ◽

Dyah Apriliani

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Sentiment Analysis ◽

Information Gain ◽

Support Vector ◽

Chi Square ◽

Proposed Model ◽

Chi Squared ◽

The Difference ◽

Increase In Accuracy

Abstrak Setiap pelanggan pasti menginginkan sebuah pendukung keputusan dalam menentukan pilihan ketika akan mengunjungi sebuah tempat makan atau kuliner yang sesuai dengan keinginan salah satu contohnya yaitu di Kota Tegal. Sentiment analysis digunakan untuk memberikan sebuah solusi terkait dengan permasalahan tersebut, dengan menereapkan model algoritma Support Vector Machine (SVM). Tujuan dari penelitian ini adalah mengoptimalisasi model yang dihasilkan dengan diterapkannya feature selection menggunakan algoritma Informatioan Gain (IG) dan Chi Square pada hasil model terbaik yang dihasilkan oleh SVM pada klasifikasi tingkat kepuasan pelanggan terhadap warung dan restoran kuliner di Kota Tegal sehingga terjadi peningkatan akurasi dari model yang dihasilkan. Hasil penelitian menunjukan bahwa tingkat akurasi terbaik dihasilkan oleh model SVM-IG dengan tingkat akurasi terbaik sebesar 72,45% mengalami peningkatan sekitar 3,08% yang awalnya 69.36%. Selisih rata-rata yang dihasilkan setelah dilakukannya optimasi SVM dengan feature selection adalah 2,51% kenaikan tingkat akurasinya. Berdasarkan hasil penelitian bahwa feature selection dengan menggunakan Information Gain (IG) (SVM-IG) memiliki tingkat akurasi lebih baik apabila dibandingkan SVM dan Chi Squared (SVM-CS) sehingga dengan demikian model yang diusulkan dapat meningkatkan tingkat akurasi yang dihasilkan oleh SVM menjadi lebih baik. Abstract The Customer needs to get a decision support in determining a choice when they’re visit a culinary restaurant accordance to their wishes especially at Tegal City. Sentiment analysis is used to provide a solution related to this problem by applying the Support Vector Machine (SVM) algorithm model. The purpose of this research is to optimize the generated model by applying feature selection using Informatioan Gain (IG) and Chi Square algorithm on the best model produced by SVM on the classification of customer satisfaction level based on culinary restaurants at Tegal City so that there is an increasing accuracy from the model. The results showed that the best accuracy level produced by the SVM-IG model with the best accuracy of 72.45% experienced an increase of about 3.08% which was initially 69.36%. The difference average produced after SVM optimization with feature selection is 2.51% increase in accuracy. Based on the results of the research, the feature selection using Information Gain (SVM-IG) has a better accuracy rate than SVM and Chi Squared (SVM-CS) so that the proposed model can improve the accuracy of SVM better.

Download Full-text

Thermal load forecasting in district heating networks using deep learning and advanced feature selection methods

Energy ◽

10.1016/j.energy.2018.05.111 ◽

2018 ◽

Vol 157 ◽

pp. 141-149 ◽

Cited By ~ 43

Author(s):

Gowri Suryanarayana ◽

Jesus Lago ◽

Davy Geysen ◽

Piotr Aleksiejuk ◽

Christian Johansson

Keyword(s):

Feature Selection ◽

Deep Learning ◽

Thermal Load ◽

Load Forecasting ◽

District Heating ◽

Selection Methods ◽

Heating Networks

Download Full-text