Kernel naive Bayes discrimination for high‐dimensional pattern recognition

2019 ◽  
Vol 61 (4) ◽  
pp. 401-428
Author(s):  
Inge Koch ◽  
Kanta Naito ◽  
Hiroaki Tanaka
Author(s):  
Tobias Sombra ◽  
Rose Santini ◽  
Emerson Morais ◽  
Walmir Couto ◽  
Alex Zissou ◽  
...  

Quantitative evaluation of a dataset can play an important role in pattern recognition of technical-scientific research involving behavior and dynamics in social networks. As an example, are the adaptive feature weighting approaches by naive Bayes text algorithm. This work aims to present an exploratory data analysis with a quantitative approach that involves pattern recognition using the Mendeley research network; to identify logics given the popularity of document access. To better analyze the results, the work was divided into four categories, each with three subcategories, that is, five, three, and two output classes. The name for these categories came up due to data collection, which also presented documents with open access, dismembering proceedings, and journals for two more categories. As a result, the performance for the test examples showed a lower error rate related to the subcategory two output classes in the criterion of popularity by using the naive Bayes algorithm in Mendeley.


Author(s):  
Joko Suntoro ◽  
Febrian Wahyu Christanto ◽  
Henny Indriyawati

The most important part in software engineering is a software defect prediction. Software defect prediction is defined as a software prediction process from errors, failures, and system errors. Machine learning methods are used by researchers to predict software defects including estimation, association, classification, clustering, and datasets analysis. Datasets of NASA Metrics Data Program (NASA MDP) is one of the metric software that researchers use to predict software defects. NASA MDP datasets contain unbalanced classes and high dimensional data, so they will affect the classification evaluation results to be low. In this research, data with unbalanced classes will be solved by the AdaCost method and high dimensional data will be handled with the Average Weight Information Gain (AWEIG) method, while the classification method that will be used is the Naïve Bayes algorithm. The proposed method is named AWEIG + AdaCost Bayesian. In this experiment, the AWEIG + AdaCost Bayesian algorithm is compared to the Naïve Bayesian algorithm. The results showed the mean of Area Under the Curve (AUC) algorithm AWEIG + AdaCost Bayesian yields better than just a Naïve Bayes algorithm with respectively mean of AUC values are 0.752 and 0.696.


Sign in / Sign up

Export Citation Format

Share Document