MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier
2020 ◽
Vol 16
(2)
◽
pp. 1-23
◽
Keyword(s):
Big Data
◽
This study presents an efficient way to deal with discrete as well as continuous values in Big Data in a parallel Naïve Bayes implementation on Hadoop's MapReduce environment. Two approaches were taken: (i) discretizing continuous values using a binning method; and (ii) using a multinomial distribution for probability estimation of discrete values and a Gaussian distribution for probability estimation of continuous values. The models were analyzed and compared for performance with respect to run time and classification accuracy for varying data sizes, data block sizes, and map memory sizes.