Structure Extension of Tree-Augmented Naive Bayes

YuGuang Long; LiMin Wang; MingHui Sun

doi:10.3390/e21080721

Structure Extension of Tree-Augmented Naive Bayes

Entropy ◽

10.3390/e21080721 ◽

2019 ◽

Vol 21 (8) ◽

pp. 721 ◽

Cited By ~ 1

Author(s):

YuGuang Long ◽

LiMin Wang ◽

MingHui Sun

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Conditional Probability Distribution ◽

Independence Assumption ◽

Bayesian Network Classifiers ◽

Leibler Divergence ◽

The Difference ◽

Structure Extension

Due to the simplicity and competitive classification performance of the naive Bayes (NB), researchers have proposed many approaches to improve NB by weakening its attribute independence assumption. Through the theoretical analysis of Kullback–Leibler divergence, the difference between NB and its variations lies in different orders of conditional mutual information represented by these augmenting edges in the tree-shaped network structure. In this paper, we propose to relax the independence assumption by further generalizing tree-augmented naive Bayes (TAN) from 1-dependence Bayesian network classifiers (BNC) to arbitrary k-dependence. Sub-models of TAN that are built to respectively represent specific conditional dependence relationships may “best match” the conditional probability distribution over the training data. Extensive experimental results reveal that the proposed algorithm achieves bias-variance trade-off and substantially better generalization performance than state-of-the-art classifiers such as logistic regression.

Download Full-text

Adapting Hidden Naive Bayes for Text Classification

Mathematics ◽

10.3390/math9192378 ◽

2021 ◽

Vol 9 (19) ◽

pp. 2378

Author(s):

Shengfeng Gan ◽

Shiqi Shao ◽

Long Chen ◽

Liangjun Yu ◽

Liangxiao Jiang

Keyword(s):

Text Classification ◽

Conditional Independence ◽

Structure Learning ◽

Naive Bayes ◽

Learning Algorithm ◽

Classification Performance ◽

Naïve Bayes ◽

Efficiency And Effectiveness ◽

The One ◽

Structure Extension

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.

Download Full-text

Classification Algorithm for Naïve Bayes Based on Validity and Correlation

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1609 ◽

2013 ◽

Vol 303-306 ◽

pp. 1609-1612

Author(s):

Huai Lin Dong ◽

Xiao Dan Zhu ◽

Qing Feng Wu ◽

Juan Juan Huang

Keyword(s):

Classification Accuracy ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classification Algorithm ◽

Experimental Results ◽

Training Data ◽

Improved Method ◽

Naive Bayes Classification ◽

The One

Naïve Bayes classification algorithm based on validity (NBCABV) optimizes the training data by eliminating the noise samples of training data with validity to improve the effect of classification, while it ignores the associations of properties. In consideration of the associations of properties, an improved method that is classification algorithm for Naïve Bayes based on validity and correlation (CANBBVC) is proposed to delete more noise samples with validity and correlation, thus resulting in better classification performance. Experimental results show this model has higher classification accuracy comparing the one based on validity solely.

Download Full-text

A Naïve Bayes Approach to Classifying Topics in Suicide Notes

Biomedical Informatics Insights ◽

10.4137/bii.s8945 ◽

2012 ◽

Vol 5s1 ◽

pp. BII.S8945 ◽

Cited By ~ 9

Author(s):

Irena Spasić ◽

Pete Burnap ◽

Mark Greenwood ◽

Michael Arribas-Ayllon

Keyword(s):

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Suicide Notes ◽

Matching Rules ◽

F Measure

The authors present a system developed for the 2011 i2b2 Challenge on Sentiment Classification, whose aim was to automatically classify sentences in suicide notes using a scheme of 15 topics, mostly emotions. The system combines machine learning with a rule-based methodology. The features used to represent a problem were based on lexico–semantic properties of individual words in addition to regular expressions used to represent patterns of word usage across different topics. A naïve Bayes classifier was trained using the features extracted from the training data consisting of 600 manually annotated suicide notes. Classification was then performed using the naïve Bayes classifier as well as a set of pattern–matching rules. The classification performance was evaluated against a manually prepared gold standard consisting of 300 suicide notes, in which 1,091 out of a total of 2,037 sentences were associated with a total of 1,272 annotations. The competing systems were ranked using the micro-averaged F-measure as the primary evaluation metric. Our system achieved the F-measure of 53% (with 55% precision and 52% recall), which was significantly better than the average performance of 48.75% achieved by the 26 participating teams.

Download Full-text

Oversampling Method on Classifying Hypertension Using Naive Bayes, Decision Tree, and Artificial Neural Network

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i4.2015 ◽

2020 ◽

Vol 4 (4) ◽

pp. 635-641

Author(s):

Nurul Chamidah ◽

Mayanda Mega Santoni ◽

Nurhafifah Matondang

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Decision Tree ◽

Missing Values ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Training Data ◽

Validation Data ◽

Artificial Neural

Oversampling is a technique to balance the number of data records for each class by generating data with a small number of records in a class, so that the amount is balanced with data with a class with a large number of records. Oversampling in this study is applied to hypertension dataset where hypertensive class has a small number of records when compared to the number of records for non-hypertensive classes. This study aims to evaluate the effect of oversampling on the classification of hypertension dataset consisting of hypertensive and non-hypertensive classes by utilizing the Naïve Bayes, Decision Tree, and Artificial Neural Network (ANN) as well as finding the best model of the three algorithms. Evaluation of the use of oversampling on hypertension dataset is done by processing the data by imputing missing values, oversampling, and transforming data into the same range, then using the Naïve Bayes, Decision Tree, and ANN to build classification models. By dividing 80% of data as training data to build models and 20% as validation data for testing models, we had an increase in classification performance in the form of accuracy, precision, and recall of the oversampled data when compared without oversampling. The best performance in this study resulted in the highest accuracy using ANN with 0.91, precision 0.86 and recall 0.99.

Download Full-text

Discriminative Structure Learning of Bayesian Network Classifiers from Training Dataset and Testing Instance

Entropy ◽

10.3390/e21050489 ◽

2019 ◽

Vol 21 (5) ◽

pp. 489 ◽

Cited By ~ 1

Author(s):

Limin Wang ◽

Yang Liu ◽

Musa Mammadov ◽

Minghui Sun ◽

Sikai Qi

Keyword(s):

Bayesian Network ◽

Learning Strategy ◽

Structure Learning ◽

Naive Bayes ◽

Search Space ◽

Naïve Bayes ◽

Bayesian Classifier ◽

Training Data ◽

Training Dataset ◽

Bayesian Network Classifiers

Over recent decades, the rapid growth in data makes ever more urgent the quest for highly scalable Bayesian networks that have better classification performance and expressivity (that is, capacity to respectively describe dependence relationships between attributes in different situations). To reduce the search space of possible attribute orders, k-dependence Bayesian classifier (KDB) simply applies mutual information to sort attributes. This sorting strategy is very efficient but it neglects the conditional dependencies between attributes and is sub-optimal. In this paper, we propose a novel sorting strategy and extend KDB from a single restricted network to unrestricted ensemble networks, i.e., unrestricted Bayesian classifier (UKDB), in terms of Markov blanket analysis and target learning. Target learning is a framework that takes each unlabeled testing instance P as a target and builds a specific Bayesian model Bayesian network classifiers (BNC) P to complement BNC T learned from training data T . UKDB respectively introduced UKDB P and UKDB T to flexibly describe the change in dependence relationships for different testing instances and the robust dependence relationships implicated in training data. They both use UKDB as the base classifier by applying the same learning strategy while modeling different parts of the data space, thus they are complementary in nature. The extensive experimental results on the Wisconsin breast cancer database for case study and other 10 datasets by involving classifiers with different structure complexities, such as Naive Bayes (0-dependence), Tree augmented Naive Bayes (1-dependence) and KDB (arbitrary k-dependence), prove the effectiveness and robustness of the proposed approach.

Download Full-text

Multiple Naïve Bayes Classifiers Ensemble for Traffic Incident Detection

Mathematical Problems in Engineering ◽

10.1155/2014/383671 ◽

2014 ◽

Vol 2014 ◽

pp. 1-16 ◽

Cited By ~ 7

Author(s):

Qingchao Liu ◽

Jian Lu ◽

Shuyan Chen ◽

Kangjia Zhao

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Classification Performance ◽

Naïve Bayes ◽

Classifier Ensemble ◽

Optimal Threshold ◽

Incident Detection ◽

Bayes Classifier ◽

Traffic Incident ◽

Better Than

This study presents the applicability of the Naïve Bayes classifier ensemble for traffic incident detection. The standard Naive Bayes (NB) has been applied to traffic incident detection and has achieved good results. However, the detection result of the practically implemented NB depends on the choice of the optimal threshold, which is determined mathematically by using Bayesian concepts in the incident-detection process. To avoid the burden of choosing the optimal threshold and tuning the parameters and, furthermore, to improve the limited classification performance of the NB and to enhance the detection performance, we propose an NB classifier ensemble for incident detection. In addition, we also propose to combine the Naïve Bayes and decision tree (NBTree) to detect incidents. In this paper, we discuss extensive experiments that were performed to evaluate the performances of three algorithms: standard NB, NB ensemble, and NBTree. The experimental results indicate that the performances of five rules of the NB classifier ensemble are significantly better than those of standard NB and slightly better than those of NBTree in terms of some indicators. More importantly, the performances of the NB classifier ensemble are very stable.

Download Full-text

Attribute Selection in Naive Bayes Algorithm Using Genetic Algorithms and Bagging for Prediction of Liver Disease

JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING ◽

10.31289/jite.v4i1.3793 ◽

2020 ◽

Vol 4 (1) ◽

pp. 76-85

Author(s):

Dwi Yuni Utami ◽

Elah Nurlelah ◽

Noer Hikmah

Keyword(s):

Genetic Algorithms ◽

Liver Disease ◽

Naive Bayes ◽

Confusion Matrix ◽

Naïve Bayes ◽

Attribute Selection ◽

World Health ◽

The Difference ◽

Bayes Algorithm ◽

Health Organization

Liver disease is an inflammatory disease of the liver and can cause the liver to be unable to function as usual and even cause death. According to WHO (World Health Organization) data, almost 1.2 million people per year, especially in Southeast Asia and Africa, have died from liver disease. The problem that usually occurs is the difficulty of recognizing liver disease early on, even when the disease has spread. This study aims to compare and evaluate Naive Bayes algorithm as a selected algorithm and Naive Bayes algorithm based on Genetic Algorithm (GA) and Bagging to find out which algorithm has a higher accuracy in predicting liver disease by processing a dataset taken from the UCI Machine Learning Repository database (GA). University of California Invene). From the results of testing by evaluating both the confusion matrix and the ROC curve, it was proven that the testing carried out by the Naive Bayes Optimization algorithm using Algortima Genetics and Bagging has a higher accuracy value than only using the Naive Bayes algorithm. The accuracy value for the Naive Bayes algorithm model is 66.66% and the accuracy value for the Naive Bayes model with attribute selection using Genetic Algorithms and Bagging is 72.02%. Based on this value, the difference in accuracy is 5.36%.Keywords: Liver Disease, Naïve Bayes, Genetic Agorithms, Bagging.

Download Full-text

Peringkasan dan Support Vector Machine pada Klasifikasi Dokumen

JURNAL INFOTEL ◽

10.20895/infotel.v9i4.312 ◽

2017 ◽

Vol 9 (4) ◽

pp. 416 ◽

Cited By ~ 1

Author(s):

Nelly Indriani Widiastuti ◽

Ednawati Rainarli ◽

Kania Evita Dewi

Keyword(s):

Support Vector Machine ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Support Vector ◽

Good Reputation ◽

Multiclass Support Vector Machine ◽

Simple Logistic ◽

Better Than

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM

Download Full-text

COMPARISON OF NAIVE BAYES ALGORITHM AND C.45 ALGORITHM IN CLASSIFICATION OF POOR COMMUNITIES RECEIVING NON CASH FOOD ASSISTANCE IN WANASARI VILLAGE KARAWANG REGENCY

Jurnal Techno Nusa Mandiri ◽

10.33480/techno.v17i1.1191 ◽

2020 ◽

Vol 17 (1) ◽

pp. 37-42

Author(s):

Yuris Alkhalifi ◽

Ainun Zumarniansyah ◽

Rian Ardianto ◽

Nila Hardi ◽

Annisa Elfina Augustia

Keyword(s):

Decision Tree ◽

Naive Bayes ◽

Confusion Matrix ◽

Total Sample ◽

Naïve Bayes ◽

Food Assistance ◽

Training Data ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Non-Cash Food Assistance or Bantuan Pangan Non-Tunai (BPNT) is food assistance from the government given to the Beneficiary Family (KPM) every month through an electronic account mechanism that is used only to buy food at the Electronic Shop Mutual Assistance Joint Business Group Hope Family Program (e-Warong KUBE PKH ) or food traders working with Bank Himbara. In its distribution, BPNT still has problems that occur that are experienced by the village apparatus especially the apparatus of Desa Wanasari on making decisions, which ones are worthy of receiving (poor) and not worthy of receiving (not poor). So one way that helps in making decisions can be done through the concept of data mining. In this study, a comparison of 2 algorithms will be carried out namely Naive Bayes Classifier and Decision Tree C.45. The total sample used is as much as 200 head of household data which will then be divided into 2 parts into validation techniques is 90% training data and 10% test data of the total sample used then the proposed model is made in the RapidMiner application and then evaluated using the Confusion Matrix table to find out the highest level of accuracy from 2 of these methods. The results in this classification indicate that the level of accuracy in the Naive Bayes Classifier method is 98.89% and the accuracy level in the Decision Tree C.45 method is 95.00%. Then the conclusion that in this study the algorithm with the highest level of accuracy is the Naive Bayes Classifier algorithm method with a difference in the accuracy rate of 3.89%.

Download Full-text

Perbandingan Metode Klasifikasi Berita Hoaks Berbahasa Indonesia Berbasis Pembelajaran Mesin

Repositor ◽

10.22219/repositor.v2i5.692 ◽

2020 ◽

Vol 2 (5) ◽

pp. 675

Author(s):

Muhammad Athaillah ◽

Yufiz Azhar ◽

Yuda Munarko

Keyword(s):

Test Data ◽

Naive Bayes ◽

Naïve Bayes ◽

Training Data ◽

Features Selection ◽

Comparison Of Algorithms ◽

Bayes Algorithm

AbstrakKlasifiaksi berita hoaks merupakan salah satu aplikasi kategorisasi teks. Berita hoaks harus diklasifikasikan karena berita hoaks dapat mempengaruhi tindakan dan pola pikir pembaca. Dalam proses klasifikasi pada penelitian ini menggunakan beberapa tahapan yaitu praproses, ekstraksi fitur, seleksi fitur dan klasifikasi. Penelitian ini bertujuan membandingkan dua algoritma yaitu algoritma Naïve Bayes dan Multinomial Naïve Bayes, manakah dari kedua algoritma tersebut yang lebih efektif dalam mengklasifikasikan berita hoaks. Data yang digunakan dalam penelitian ini berasal dari www.trunbackhoax.id untuk data berita hoaks sebanyak 100 artikel dan data berita non-hoaks berasal dari kompas.com, detik.com berjumlah 100 artikel. Data latih berjumlah 140 artikel dan data uji berjumlah 60 artikel. Hasil perbandingan algoritma Naïve Bayes memiliki nilai F1-score sebesar 0,93 dan nilai F1-score Multinomial Naïve Bayes sebesar 0,92. Abstarct Classification hoax news is one of text categorizations applications. Hoax news must be classified because the hoax news can influence the reader actions and thinking patterns. Classification process in this reseacrh uses several stages, namely preprocessing, features extraxtion, features selection and classification. This research to compare Naïve Bayes algorithm and Multinomial Naïve Bayes algorithm, which of the two algorithms is more effective on classifying hoax news. The data from this research from turnbackhoax.id as hoax news of 100 articles and non-hoax news from kompas.com, detik.com of 100 articles. Training data 140 articles dan test data 60 articles. The result of the comparison of algorithms Naïve Bayes has an F1-score value of 0,93 and Naïve Bayes has an F1-score value of 0,92.

Download Full-text