A Hybrid Feature Selection Method based on IGSBFS and Naive Bayes for the Diagnosis of Erythemato - Squamous Diseases

Naïve Bayes is one of data mining methods that are commonly used in text-based document classification. The advantage of this method is a simple algorithm with low computation complexity. However, there is weaknesses on Naïve Bayes methods where independence of Naïve Bayes features can’t be always implemented that would affect the accuracy of the calculation. Therefore, Naïve Bayes methods need to be optimized by assigning weights using Gain Ratio on its features. However, assigning weights on Naïve Bayes’s features cause problems in calculating the probability of each document which is caused by there are many features in the document that not represent the tested class. Therefore, the weighting Naïve Bayes is still not optimal. This paper proposes optimization of Naïve Bayes method using weighted by Gain Ratio and feature selection method in the case of text classification. Results of this study pointed-out that Naïve Bayes optimization using feature selection and weighting produces accuracy of 94%.

Download Full-text

Comparison of Naïve Bayes Algorithm and Decision Tree C4.5 for Hospital Readmission Diabetes Patients using HbA1c Measurement

Knowledge Engineering and Data Science ◽

10.17977/um018v2i22019p58-71 ◽

2019 ◽

Vol 2 (2) ◽

pp. 58 ◽

Cited By ~ 1

Author(s):

Utomo Pujianto ◽

Asa Luki Setiawan ◽

Harits Ar Rosyid ◽

Ali M. Mohammad Salah

Keyword(s):

Feature Selection ◽

Decision Tree ◽

Naive Bayes ◽

Feature Selection Method ◽

Selection Method ◽

Naïve Bayes ◽

The Body ◽

Classification Model ◽

Diabetic Patients ◽

Patient Readmissions

Diabetes is a metabolic disorder disease in which the pancreas does not produce enough insulin or the body cannot use insulin produced effectively. The HbA1c examination, which measures the average glucose level of patients during the last 2-3 months, has become an important step to determine the condition of diabetic patients. Knowledge of the patient's condition can help medical staff to predict the possibility of patient readmissions, namely the occurrence of a patient requiring hospitalization services back at the hospital. The ability to predict patient readmissions will ultimately help the hospital to calculate and manage the quality of patient care. This study compares the performance of the Naïve Bayes method and C4.5 Decision Tree in predicting readmissions of diabetic patients, especially patients who have undergone HbA1c examination. As part of this study we also compare the performance of the classification model from a number of scenarios involving a combination of preprocessing methods, namely Synthetic Minority Over-Sampling Technique (SMOTE) and Wrapper feature selection method, with both classification techniques. The scenario of C4.5 method combined with SMOTE and feature selection method produces the best performance in classifying readmissions of diabetic patients with an accuracy value of 82.74 %, precision value of 87.1 %, and recall value of 82.7 %.

Download Full-text

Hybrid Feature Selection Method Based on a Naïve Bayes Algorithm that Enhances the Learning Speed while Maintaining a Similar Error Rate in Cyber ISR

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2018.12.005 ◽

2018 ◽

Vol 12 (12) ◽

Keyword(s):

Feature Selection ◽

Error Rate ◽

Naive Bayes ◽

Feature Selection Method ◽

Selection Method ◽

Naïve Bayes ◽

Learning Speed ◽

Bayes Algorithm

Download Full-text

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

International Scholarly Research Notices ◽

10.1155/2014/717092 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 20

Author(s):

Subhajit Dey Sarkar ◽

Saptarsi Goswami ◽

Aman Agarwal ◽

Javed Aktar

Keyword(s):

Feature Selection ◽

Text Classification ◽

Text Categorization ◽

Naive Bayes ◽

Feature Selection Method ◽

Search Space ◽

Selection Method ◽

Naïve Bayes ◽

Training Data ◽

Feature Selection Technique

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.

Download Full-text

Deteksi Kanker Berdasarkan Data Microarray Menggunakan Metode Naïve Bayes dan Hybrid Feature Selection

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v4i3.2096 ◽

2020 ◽

Vol 4 (3) ◽

pp. 486

Author(s):

Bintang Peryoga ◽

Adiwijaya Adiwijaya ◽

Widi Astuti

Keyword(s):

Feature Selection ◽

Dimension Reduction ◽

Naive Bayes ◽

Information Gain ◽

Feature Selection Method ◽

Naïve Bayes ◽

Small Sample ◽

Filter Method ◽

Cancer Genes ◽

Cancer Data

Cancer is a deadly disease that is responsible for 9.6 million death in 2018 based on WHO data so early cancer detection is needed so can be treated immediately and cancer deaths can be reduced. Microarray is technology that can monitor and analyze the expression of cancer genes in microarray data but has high data dimension and small sample so dimensional reductions are needed for the optimal classification process. Dimension reduction can reduce the use of features for the classification process by selecting some influential features. Hybrid method is one dimension reduction by combining Filter method with Wrapper so it gets the both advantage. In this case, researchers combined Naïve Bayes with Hybrid Feature Selection (Information Gain - Genetic Algorithm) on cancer data for microarray Lung Cancer, Ovarian Cancer, Breast Cancer, Colon Tumors, and Prostate Tumors. These data were obtained from Kent-Ridge Biomedical Dataset. The results showed that from 5 data used, 4 data obtained an accuracy between 87-100% while the prostate tumor data obtained the smallest accuracy of 61.14%. The implementation of the feature selection method and the classification of the 5 cancer data above only uses less than 63 features to obtain this accuracy

Download Full-text

TEXT CLASSIFICATION USING NAIVE BAYES UPDATEABLE ALGORITHM IN SBMPTN TEST QUESTIONS

Telematika ◽

10.31315/telematika.v13i2.1728 ◽

2017 ◽

Vol 13 (2) ◽

pp. 123 ◽

Cited By ~ 1

Author(s):

Ristu Saptono ◽

Meianto Eko Sulistyo ◽

Nur Shobriana Trihabsari

Keyword(s):

Text Classification ◽

Naive Bayes ◽

Feature Selection Method ◽

Classification Performance ◽

Selection Method ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Existing Data

Document classification is a growing interest in the research of text mining. Classification can be done based on the topics, languages, and so on. This study was conducted to determine how Naive Bayes Updateable performs in classifying the SBMPTN exam questions based on its theme. Increment model of one classification algorithm often used in text classification Naive Bayes classifier has the ability to learn from new data introduces with the system even after the classifier has been produced with the existing data. Naive Bayes Classifier classifies the exam questions based on the theme of the field of study by analyzing keywords that appear on the exam questions. One of feature selection method DF-Thresholding is implemented for improving the classification performance. Evaluation of the classification with Naive Bayes classifier algorithm produces 84,61% accuracy.

Download Full-text

Metode Seleksi Fitur Untuk Klasifikasi Sentimen Menggunakan Algoritma Naive Bayes: Sebuah Literature Review

JURNAL MEDIA INFORMATIKA BUDIDARMA ◽

10.30865/mib.v5i3.2983 ◽

2021 ◽

Vol 5 (3) ◽

pp. 799

Author(s):

Fitria Septianingrum ◽

Agung Susilo Yuda Irawan

Keyword(s):

Feature Selection ◽

Literature Review ◽

Sentiment Analysis ◽

Industrial Revolution ◽

Naive Bayes ◽

Feature Selection Method ◽

Naïve Bayes ◽

Digital Data ◽

The Internet ◽

Bayes Algorithm

In the era of the industrial revolution 4.0 as it is today, where the internet is a necessity for people to live their daily lives. The high intensity of internet use in the community, it causes the distribution of information in it to spread widely and quickly. The rapid distribution of information on the internet is also in line with the growing growth of digital data, so that the public opinions contained therein become important things. Because, from this digital data, it can be processed with sentiment analysis in order to obtain useful information about issues that are developing in the community or to find out public opinion on a company's product. The number of studies related to sentiment analysis that applies the Naive Bayes algorithm to solve the problem, so researchers are interested in conducting research on the use of feature selection for the algorithm. Therefore, this research was conducted to determine what feature selection is the most optimal when combined with the Naive Bayes algorithm using the Systematic Literature Review (SLR) research method. The results of this study concluded that the most optimal feature selection method when combined with the Naive Bayes algorithm is the Particle Swarm Optimization (PSO) method with an average accuracy value of 89.08%.

Download Full-text

Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i3.1942 ◽

2020 ◽

Vol 4 (3) ◽

pp. 504-512

Author(s):

Faried Zamachsari ◽

Gabriel Vangeran Saragih ◽

Susafa'ati ◽

Windu Gata

Keyword(s):

Social Media ◽

Support Vector Machine ◽

Feature Selection ◽

Public Opinion ◽

Naive Bayes ◽

Naïve Bayes ◽

Capital City ◽

Support Vector ◽

National Capital ◽

Bayes Algorithm

The decision to move Indonesia's capital city to East Kalimantan received mixed responses on social media. When the poverty rate is still high and the country's finances are difficult to be a factor in disapproval of the relocation of the national capital. Twitter as one of the popular social media, is used by the public to express these opinions. How is the tendency of community responses related to the move of the National Capital and how to do public opinion sentiment analysis related to the move of the National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine to get the highest accuracy value is the goal in this study. Sentiment analysis data will take from public opinion using Indonesian from Twitter social media tweets in a crawling manner. Search words used are #IbuKotaBaru and #PindahIbuKota. The stages of the research consisted of collecting data through social media Twitter, polarity, preprocessing consisting of the process of transform case, cleansing, tokenizing, filtering and stemming. The use of feature selection to increase the accuracy value will then enter the ratio that has been determined to be used by data testing and training. The next step is the comparison between the Support Vector Machine and Naive Bayes methods to determine which method is more accurate. In the data period above it was found 24.26% positive sentiment 75.74% negative sentiment related to the move of a new capital city. Accuracy results using Rapid Miner software, the best accuracy value of Naive Bayes with Feature Selection is at a ratio of 9:1 with an accuracy of 88.24% while the best accuracy results Support Vector Machine with Feature Selection is at a ratio of 5:5 with an accuracy of 78.77%.

Download Full-text

Improvement of feature selection method in spam filtering

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02812 ◽

2009 ◽

Vol 29 (10) ◽

pp. 2812-2815

Author(s):

Yang-zhu LU ◽

Xin-you ZHANG ◽

Yu QI

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Spam Filtering

Download Full-text