A New Technique for Sentiment Analysis System Based on Deep Learning Using Chi-Square Feature Selection Methods

Author(s):  
Mohammed HUSSEİN ◽  
Fatih ÖZYURT
2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Monalisa Ghosh ◽  
Goutam Sanyal

Sentiment classification or sentiment analysis has been acknowledged as an open research domain. In recent years, an enormous research work is being performed in these fields by applying various numbers of methodologies. Feature generation and selection are consequent for text mining as the high-dimensional feature set can affect the performance of sentiment analysis. This paper investigates the inability or incompetency of the widely used feature selection methods (IG, Chi-square, and Gini Index) with unigram and bigram feature set on four machine learning classification algorithms (MNB, SVM, KNN, and ME). The proposed methods are evaluated on the basis of three standard datasets, namely, IMDb movie review and electronics and kitchen product review dataset. Initially, unigram and bigram features are extracted by applying n-gram method. In addition, we generate a composite features vector CompUniBi (unigram + bigram), which is sent to the feature selection methods Information Gain (IG), Gini Index (GI), and Chi-square (CHI) to get an optimal feature subset by assigning a score to each of the features. These methods offer a ranking to the features depending on their score; thus a prominent feature vector (CompIG, CompGI, and CompCHI) can be generated easily for classification. Finally, the machine learning classifiers SVM, MNB, KNN, and ME used prominent feature vector for classifying the review document into either positive or negative. The performance of the algorithm is measured by evaluation methods such as precision, recall, and F-measure. Experimental results show that the composite feature vector achieved a better performance than unigram feature, which is encouraging as well as comparable to the related research. The best results were obtained from the combination of Information Gain with SVM in terms of highest accuracy.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Hongyan Zhang ◽  
Lanzhi Li ◽  
Chao Luo ◽  
Congwei Sun ◽  
Yuan Chen ◽  
...  

In efforts to discover disease mechanisms and improve clinical diagnosis of tumors, it is useful to mine profiles for informative genes with definite biological meanings and to build robust classifiers with high precision. In this study, we developed a new method for tumor-gene selection, the Chi-square test-based integrated rank gene and direct classifier (χ2-IRG-DC). First, we obtained the weighted integrated rank of gene importance from chi-square tests of single and pairwise gene interactions. Then, we sequentially introduced the ranked genes and removed redundant genes by using leave-one-out cross-validation of the chi-square test-based Direct Classifier (χ2-DC) within the training set to obtain informative genes. Finally, we determined the accuracy of independent test data by utilizing the genes obtained above withχ2-DC. Furthermore, we analyzed the robustness ofχ2-IRG-DC by comparing the generalization performance of different models, the efficiency of different feature-selection methods, and the accuracy of different classifiers. An independent test of ten multiclass tumor gene-expression datasets showed thatχ2-IRG-DC could efficiently control overfitting and had higher generalization performance. The informative genes selected byχ2-IRG-DC could dramatically improve the independent test precision of other classifiers; meanwhile, the informative genes selected by other feature selection methods also had good performance inχ2-DC.


2017 ◽  
Vol 75 ◽  
pp. 80-93 ◽  
Author(s):  
Alireza Yousefpour ◽  
Roliana Ibrahim ◽  
Haza Nuzly Abdel Hamed

2017 ◽  
Vol 24 (1) ◽  
pp. 3-37 ◽  
Author(s):  
SANDRA KÜBLER ◽  
CAN LIU ◽  
ZEESHAN ALI SAYYED

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.


2018 ◽  
Vol 5 (5) ◽  
pp. 537 ◽  
Author(s):  
Oman Somantri ◽  
Dyah Apriliani

<p class="Judul2"><strong>Abstrak</strong></p><p class="Judul2"> </p><p class="Abstrak">Setiap pelanggan pasti menginginkan sebuah pendukung keputusan dalam menentukan pilihan ketika akan mengunjungi sebuah tempat makan atau kuliner yang sesuai dengan keinginan salah satu contohnya yaitu di Kota Tegal. <em>Sentiment analysis</em> digunakan untuk memberikan sebuah solusi terkait dengan permasalahan tersebut, dengan menereapkan model algoritma S<em>upport Vector Machine</em> (SVM). Tujuan dari penelitian ini adalah mengoptimalisasi model yang dihasilkan dengan diterapkannya <em>feature selection</em> menggunakan algoritma <em>Informatioan Gain</em> (IG) dan <em>Chi Square</em> pada hasil model terbaik yang dihasilkan oleh SVM pada klasifikasi tingkat kepuasan pelanggan terhadap warung dan restoran kuliner di Kota Tegal sehingga terjadi peningkatan akurasi dari model yang dihasilkan. Hasil penelitian menunjukan bahwa tingkat akurasi terbaik dihasilkan oleh model SVM-IG dengan tingkat akurasi terbaik sebesar 72,45% mengalami peningkatan sekitar 3,08% yang awalnya 69.36%. Selisih rata-rata yang dihasilkan setelah dilakukannya optimasi SVM dengan <em>feature selection</em> adalah 2,51% kenaikan tingkat akurasinya. Berdasarkan hasil penelitian bahwa <em>feature selection</em> dengan menggunakan <em>Information Gain (IG)</em> (SVM-IG) memiliki tingkat akurasi lebih baik apabila dibandingkan SVM dan <em>Chi Squared</em> (SVM-CS) sehingga dengan demikian model yang diusulkan dapat meningkatkan tingkat akurasi yang dihasilkan oleh SVM menjadi lebih baik.</p><p class="Abstrak"><strong><em><br /></em></strong></p><p class="Abstrak"><strong><em>Abstract</em></strong></p><p class="Judul2"> </p><p class="Judul2"><em>The Customer needs to get a decision support in determining a choice when they’re visit a culinary restaurant accordance to their wishes especially at Tegal City. Sentiment analysis is used to provide a solution related to this problem by applying the Support Vector Machine (SVM) algorithm model. The purpose of this research is to optimize the generated model by applying feature selection using Informatioan Gain (IG) and Chi Square algorithm on the best model produced by SVM on the classification of customer satisfaction level based on culinary restaurants at Tegal City so that there is an increasing accuracy from the model. The results showed that the best accuracy level produced by the SVM-IG model with the best accuracy of 72.45% experienced an increase of about 3.08% which was initially 69.36%. The difference average produced after SVM optimization with feature selection is 2.51% increase in accuracy. Based on the results of the research, the feature selection using Information Gain (SVM-IG) has a better accuracy rate than SVM and Chi Squared (SVM-CS) so that the proposed model can improve the accuracy of SVM better.</em></p>


Energy ◽  
2018 ◽  
Vol 157 ◽  
pp. 141-149 ◽  
Author(s):  
Gowri Suryanarayana ◽  
Jesus Lago ◽  
Davy Geysen ◽  
Piotr Aleksiejuk ◽  
Christian Johansson

Sign in / Sign up

Export Citation Format

Share Document