Building Naïve Bayes Classifiers with High-Dimensional and Small-Sized Data Sets

Sentiment analysis as part of text mining research domain has been being recognized due to the successful implementation in social media analysis. Sentiment analysis methods had intelligent ability to classify texts into negative or positive. Classified texts concluded whole users respond and described opinion polarity about particular topic. Based on this idea, this research took e-learning’s users opinion as object to be measured through sentiment analysis. The results can be used to evaluate the e-learning activity. This research had been implemented in Widyatama University which had been running e-learning activity for several years. Qualitative method by given questioner to users and gather the feedback is commonly used as evaluation of e-learning system previously. Still, questioner doesn’t represent the conclusion about the whole opinion. Hence, it needs the method to identify opinion polarity from e-learning member. The e-learning opinion data sets were gathered from questioner filled by e-learning member included both student and lecturer as participants. The participants gave review about learning outcome after their participation in e-learning activity. Their opinion was needed to describe current situation about e-learning activity. Therefore, the conclusion could be used to make improvement and described few achievements about the e-learning system. The data sets trained by Naïve Bayes classifier to group each user respond into negative or positive. The classification results were also evaluated by a number of particular evaluation metric used in data mining to show the classifier performance such as accuracy, precision, and recall.

Download Full-text

Testing the statistical significance of an ultra-high-dimensional naïve Bayes classifier

Statistics and Its Interface ◽

10.4310/sii.2013.v6.n2.a6 ◽

2013 ◽

Vol 6 (2) ◽

pp. 223-229 ◽

Cited By ~ 3

Author(s):

Baiguo An ◽

Jianhua Guo ◽

Hansheng Wang

Keyword(s):

Naive Bayes ◽

Statistical Significance ◽

Naïve Bayes ◽

High Dimensional ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Sentiment Analysis of Twitter Data

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.2019040101 ◽

2019 ◽

Vol 14 (2) ◽

pp. 1-16 ◽

Cited By ~ 2

Author(s):

Ankit Srivastava ◽

Vijendra Singh ◽

Gurdeep Singh Drall

Keyword(s):

Big Data ◽

Sentiment Analysis ◽

Naive Bayes ◽

Hybrid Approach ◽

Naïve Bayes ◽

Data Sets ◽

The Novel ◽

New Techniques ◽

Hidden Knowledge

Over the past few years, the novel appeal and increasing popularity of social networks as a medium for users to express their opinions and views have created an accumulation of a massive amount of data. This evolving mountain of data is commonly termed Big Data. Accordingly, one area in which the application of new techniques in data mining research has significant potential to achieve more precise classification of hidden knowledge in Big Data is sentiment analysis (aka optimal mining). A hybrid approach using Naïve Bayes and Random Forest on mining Twitter datasets is presented here as an extension of previous work. Briefly, relevant data sets are collected from Twitter using Twitter API; then, use of the hybrid methodology is illustrated and evaluated against one with only Naïve Bayes classifier. Results show better accuracy and efficiency in the sentiment classification for the hybrid approach.

Download Full-text

Discretization as the enabling technique for the Naïve Bayes and semi-Naïve Bayes-based classification

The Knowledge Engineering Review ◽

10.1017/s0269888910000329 ◽

2010 ◽

Vol 25 (4) ◽

pp. 421-449 ◽

Cited By ~ 13

Author(s):

Marcin J. Mizianty ◽

Lukasz A. Kurgan ◽

Marek R. Ogiela

Keyword(s):

Naive Bayes ◽

Positive Influence ◽

Naïve Bayes ◽

Data Sets ◽

Classification Problems ◽

Entropy Maximization ◽

Time To Build ◽

Discretization Schemes ◽

Discretization Algorithm ◽

Contingency Coefficient

AbstractCurrent classification problems that concern data sets of large and increasing size require scalable classification algorithms. In this study, we concentrate on several scalable, linear complexity classifiers that include one of the top 10 voted data mining methods, Naïve Bayes (NB), and several recently proposed semi-NB classifiers. These algorithms perform front-end discretization of the continuous features since by design they work only with nominal or discrete features. We address the lack of studies that investigate the benefits and drawbacks of discretization in the context of the subsequent classification. Our comprehensive empirical study considers 12 discretizers (two unsupervised and 10 supervised), seven classifiers (two classical NB and five semi-NB), and 16 data sets. We investigate the scalability of the discretizers and show that the fastest supervised discretizers fast class-attribute interdependency maximization (FCAIM), class-attribute interdependency maximization (CAIM), and information entropy maximization (IEM) provide discretization schemes with the highest overall quality. We show that discretization improves the classification accuracy when compared against the two classical methods, NB and Flexible Naïve Bayes (FNB), executed on the raw data. The choice of the discretization algorithm impacts the significance of the improvements. The MODL, FCAIM, and CAIM methods provide statistically significant improvements, while the IEM, Class-attribute contingency coefficient (CACC), and Khiops discretizers provide moderate improvements. The most accurate classification models are generated by the Averaged one-dependence estimators (AODEsr) classifier followed by AODE and HNB (Hidden Naïve Bayes). AODEsr run on data discretized with MODL, FCAIM, and CAIM provides statistically significantly better accuracies than both the classical NB methods. The worst results are obtained with the NB, FNB, and LBR (Lazy Bayes rule) classifiers. We show that although the time to build the discretization scheme could be longer than the time to train the classifier, the completion of the entire process (to discretize data, compute the classifier, and predict test instances) is often faster than the NB-based classification of the continuous instances. This is because the time to classify test instances is an important factor that is positively influenced by discretization. The biggest positive influence, both on the accuracy and the classification time, is associated with the MODL, FCAIM, and CAIM algorithms.

Download Full-text

Software Defect Prediction Using AWEIG+ADACOST Bayesian Algorithm for Handling High Dimensional Data and Class Imbalance Problem

International Journal of Information Technology and Business ◽

10.24246/ijiteb.112018.36-41 ◽

2018 ◽

Vol 1 (1) ◽

pp. 36-41

Author(s):

Joko Suntoro ◽

Febrian Wahyu Christanto ◽

Henny Indriyawati

Keyword(s):

Naive Bayes ◽

High Dimensional Data ◽

Naïve Bayes ◽

High Dimensional ◽

Defect Prediction ◽

Software Defect Prediction ◽

Software Defects ◽

Bayesian Algorithm ◽

Software Defect ◽

Bayes Algorithm

The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.

Download Full-text

DISCRIMINATIVELY WEIGHTED NAIVE BAYES AND ITS APPLICATION IN TEXT CLASSIFICATION

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213011004770 ◽

2012 ◽

Vol 21 (01) ◽

pp. 1250007 ◽

Cited By ~ 33

Author(s):

LIANGXIAO JIANG ◽

DIANHONG WANG ◽

ZHIHUA CAI

Keyword(s):

Naive Bayes ◽

State Of The Art ◽

Naïve Bayes ◽

Experimental Results ◽

Data Sets ◽

Conditional Independence Assumption ◽

Instance Weighting ◽

Text Classifiers ◽

Bayes Algorithm ◽

The One

Many approaches are proposed to improve naive Bayes by weakening its conditional independence assumption. In this paper, we work on the approach of instance weighting and propose an improved naive Bayes algorithm by discriminative instance weighting. We called it Discriminatively Weighted Naive Bayes. In each iteration of it, different training instances are discriminatively assigned different weights according to the estimated conditional probability loss. The experimental results based on a large number of UCI data sets validate its effectiveness in terms of the classification accuracy and AUC. Besides, the experimental results on the running time show that our Discriminatively Weighted Naive Bayes performs almost as efficiently as the state-of-the-art Discriminative Frequency Estimate learning method, and significantly more efficient than Boosted Naive Bayes. At last, we apply the idea of discriminatively weighted learning in our algorithm to some state-of-the-art naive Bayes text classifiers, such as multinomial naive Bayes, complement naive Bayes and the one-versus-all-but-one model, and have achieved remarkable improvements.

Download Full-text

Kernel naive Bayes discrimination for high‐dimensional pattern recognition

Australian & New Zealand Journal of Statistics ◽

10.1111/anzs.12279 ◽

2019 ◽

Vol 61 (4) ◽

pp. 401-428

Author(s):

Inge Koch ◽

Kanta Naito ◽

Hiroaki Tanaka

Keyword(s):

Pattern Recognition ◽

Naive Bayes ◽

Naïve Bayes ◽

High Dimensional

Download Full-text