MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier

Sikha Bagui; Keerthi Devulapalli; Sharon John

doi:10.4018/ijiit.2020040101

MapReduce Implementation of a Multinomial and Mixed Naive Bayes Classifier

International Journal of Intelligent Information Technologies ◽

10.4018/ijiit.2020040101 ◽

2020 ◽

Vol 16 (2) ◽

pp. 1-23 ◽

Cited By ~ 1

Author(s):

Sikha Bagui ◽

Keerthi Devulapalli ◽

Sharon John

Keyword(s):

Big Data ◽

Gaussian Distribution ◽

Classification Accuracy ◽

Naive Bayes ◽

Multinomial Distribution ◽

Naïve Bayes ◽

Probability Estimation ◽

Bayes Classifier ◽

Discrete Values ◽

Block Sizes

This study presents an efficient way to deal with discrete as well as continuous values in Big Data in a parallel Naïve Bayes implementation on Hadoop's MapReduce environment. Two approaches were taken: (i) discretizing continuous values using a binning method; and (ii) using a multinomial distribution for probability estimation of discrete values and a Gaussian distribution for probability estimation of continuous values. The models were analyzed and compared for performance with respect to run time and classification accuracy for varying data sizes, data block sizes, and map memory sizes.

Download Full-text

CNB-MRF: Adapting Correlative Naive Bayes Classifier and MapReduce Framework for Big Data Classification

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v11i11.10116 ◽

2016 ◽

Vol 11 (11) ◽

pp. 1007 ◽

Cited By ~ 3

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Data Classification ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Mapreduce Framework ◽

Big Data Classification

Download Full-text

Enhancing the Classification Accuracy in Sentiment Analysis using Joint Sentiment Topic Detection with Naive Bayes Classifier

Asian Journal of Research in Social Sciences and Humanities ◽

10.5958/2249-7315.2016.01280.6 ◽

2016 ◽

Vol 6 (12) ◽

pp. 105

Author(s):

PCD Kalaivaani ◽

R. Thangarajan

Keyword(s):

Sentiment Analysis ◽

Classification Accuracy ◽

Naive Bayes ◽

Naïve Bayes ◽

Topic Detection ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text

Εξόρυξη γνώσης από αρχεία μεγάλου όγκου δεδομένων υγείας -Big Data- με χρήση υπολογιστικών αλγορίθμων ανάλυσης - Health Analytics

10.12681/eadd/50564 ◽

2021 ◽

Author(s):

Ιωάννης Μήνου

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Random Forest ◽

Cross Validation ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Η μεγαλύτερη πρόκληση των σύγχρονων υπολογιστικών συστημάτων είναι αναμφισβήτητα η αποδοτική αποθήκευση και ανάκτηση πολύ μεγάλου όγκου δεδομένων. Η ανάγκη αυτή έκανε την εμφάνισή της τα τελευταία χρόνια λόγω της έκρηξης δεδομένων που παρατηρείται στο διαδίκτυο και αποκτά ολοένα και μεγαλύτερη σημασία λόγω του πολύ μεγάλου εύρους πληροφοριών που μπορούμε να αντλήσουμε. Ο τομέας της υγειονομικής περίθαλψης και των ιατρικών δεδομένων είναι συνεχώς και ταχέως εξελισσόμενος. Η αξιοποίηση των Big Data στο χώρο της υγείας προσφέρει πολύτιμη πληροφόρηση καθώς παρουσιάζουν απεριόριστες δυνατότητες για αποτελεσματική αποθήκευση, επεξεργασία, sql queries και ανάλυση ιατρικών δεδομένων.Σκοπός της παρούσας διατριβής είναι η μελέτη τεχνικών εξόρυξης γνώσης για δεδομένα μεγάλου όγκου, που αφορούν το πεδίο της Υγείας. Παράλληλα σκοπός της έρευνας είναι η μελέτη στατιστικών και υπολογιστικών αλγορίθμων ανάλυσης μεγάλου όγκου δεδομένων υγείας που έχουν ως αποτέλεσμα την παραγωγή νέας γνώσης καθώς και την εξαγωγή στατιστικά σημαντικής πληροφορίας για τους επαγγελματίες υγείας. Τέλος, η παρούσα διατριβή διερευνά τις γνώσεις των επιστημόνων της Πληροφορικής Υγείας και των επαγγελματιών υγείας σχετικά με τα Big Data.Στην παρούσα διδακτορική διατριβή έγινε βιβλιογραφική ανασκόπηση της έννοιας των Big Data. Η ανασκόπηση αυτή περιλαμβάνει τον ορισμό των Big Data ,τα χαρακτηριστικά τους, τα πλεονεκτήματα και τα μειονεκτήματά τους στο χώρο της υγείας. Στη συνέχεια γίνεται αναφορά στην υλοποίηση και στους μηχανισμούς αποθήκευσης των Big Data. Επιπλέον γίνεται αναφορά στα συστήματα ανάλυσης και επεξεργασίας μεγάλου όγκου δεδομένων, στις γλώσσες προγραμματισμού για Big Data, στην εξόρυξη γνώσης δεδομένων στο χώρο της υγείας. Ακόμη γίνεται αναφορά στη χρήση των Big Data στην Ευρώπη και στον κόσμο. Τέλος παρουσιάζονται οι βασικές αρχές του GDPR καθώς και το πώς σχετίζεται με τα Big Data στο χώρο της υγείας. Επίσης διεξήχθησαν δύο εμπειρικές μελέτες.Η πρώτη μελέτη είχε σαν στόχο την καταγραφή της άποψης των επιστημόνων της Πληροφορικής Υγείας σχετικά με την τεχνολογία των Big Data. Η συλλογή των δεδομένων έγινε με χρήση ερωτηματολογίου. Η στατιστική ανάλυση έδειξε τη θετική ανταπόκριση του δείγματος σχετικά με την τεχνολογία των Big Data.Η δεύτερη μελέτη είχε σαν στόχο την καταγραφή της άποψης των Επαγγελματιών Υγείας σχετικά με την τεχνολογία των Big Data. Η συλλογή των δεδομένων έγινε με χρήση ερωτηματολογίου. Η στατιστική ανάλυση δεν έδωσε επαρκείς απαντήσεις καθώς οι ερωτηθέντες έδειξαν θετική στάση απέναντι στα Big Data ενώ απάντησαν ότι δεν γνωρίζουν πολλά για τη συγκεκριμένη τεχνολογία.Το τελευταίο κομμάτι της διατριβής περιλαμβάνει την ανάπτυξη μεθόδων πρόβλεψης για την δυνατότητα διάγνωσης των ασθενών με καρδιαγγειακά νοσήματα. Οι μέθοδοι πρόβλεψης που χρησιμοποιήθηκαν είναι: Λογιστική Παλινδρόμηση, Naive Bayes Classifier, Δένδρα αποφάσεων, Αλγόριθμος Κ κοντινότερων γειτόνων, Αλγόριθμος SVM (Support Vector Machine) και Random Forest. Η ανάπτυξη περιλάμβανε όλα τα στάδια προεπεξεργασίας των δεδομένων ενώ χρησιμοποιήθηκαν συγκεκριμένες μετρικές για τη μέτρηση της απόδοσης των κατηγοριοποιητών. Τέλος έγιναν βελτιώσεις της απόδοσης των κατηγοριοποιητών χρησιμοποιώντας διασταυρωτική επαλήθευση με την μέθοδο cross-validation ενώ επιλύθηκε και το πρόβλημα της ανισορροπίας των κλάσεων χρησιμοποιώντας τη μέθοδο SMOTE.

Download Full-text

Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data

Evolutionary Intelligence ◽

10.1007/s12065-019-00276-9 ◽

2019 ◽

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Mapreduce Model

Download Full-text

Pemanfaatan Big Data Media Sosial Dalam Menganalisa Kemenangan Pilkada

Majalah Ilmiah Teknologi Elektro ◽

10.24843/mite.2019.v18i01.p15 ◽

2019 ◽

Vol 18 (1) ◽

pp. 101

Author(s):

Dewa Ayu Putri Wulandari ◽

Made Sudarma ◽

Nyoman Paramaita

Keyword(s):

Big Data ◽

Naive Bayes ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

N Gram

Pemilihan Calon Gubernur dan Wakil Gubernur Bali 2018 akan melalui beberapa tahapan pemilu mulai dari penentuan bakal calon Gubernur dan Wakil Gubernur Bali hingga tahapan penghitungan suara. Dalam pemilihan Gubernur dan Wakil Gubernur Bali masyarakat dapat terlibat langsung dalam tahapan pemungutan suara yang akan dilaksanakan pada tanggal 27 Juni 2018 (KPU, 2018). Sehingga dapat memunculkan banyak komentar atau pendapat, tidak hanya komentar positif dan netral tapi juga komentar yang negatif. Penelitian ini diharapkan mampu untuk melakukan riset atas komentar masyarakat yang mengandung sentimen baik atau positif, sama sekali tidak mengandung senrimen atau netral dan mengandung sentimen buruk atau negatif. Dalam penelitian ini metode digunakan untuk preprocessingdata menggunakan tokenisasi N-gram. N-gram adalah token yang terdiri dari tiga kata setiap satu token. Pada tahap stemming menggunakan algoritma Nzief Adriani. Untuk proses klasifikasinya menggunakan metode Naïve Bayes Classifier (NBC).Pada pengujian data calon Gubernur akurasi tertinggi diperoleh dari klasifikasi data KBS-Ace pada data yang diambil dari Twitter dengan nilai akurasi 89%, presisi 91% dan recall 94% dan akurasi terendah pada saat proses kalsifikasi data KBS-Ace pada media sosial Facebook. Kata Kunci—Analisa Sentimen, Calon Gubernur Bali 2018, Naive Bayes Classifier

Download Full-text

Texture Image Categorization in Wavelet Domain via Naive Bayes Classifier Based on Laplace and Generalized Gaussian Distribution

2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI) ◽

10.1109/iri.2019.00034 ◽

2019 ◽

Author(s):

Muhammad Azam ◽

Nizar Bouguila

Keyword(s):

Gaussian Distribution ◽

Naive Bayes ◽

Naïve Bayes ◽

Wavelet Domain ◽

Naive Bayes Classifier ◽

Image Categorization ◽

Bayes Classifier ◽

Texture Image ◽

Naïve Bayes Classifier ◽

Generalized Gaussian Distribution

Download Full-text

Integrating Cuckoo search-Grey wolf optimization and Correlative Naive Bayes classifier with Map Reduce model for big data classification

Data & Knowledge Engineering ◽

10.1016/j.datak.2019.101788 ◽

2020 ◽

Vol 127 ◽

pp. 101788

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Data Classification ◽

Cuckoo Search ◽

Naïve Bayes ◽

Map Reduce ◽

Grey Wolf ◽

Bayes Classifier ◽

Grey Wolf Optimization ◽

Big Data Classification

Download Full-text

Analysis of Bayesian optimization algorithms for big data classification based on Map Reduce framework

Journal Of Big Data ◽

10.1186/s40537-021-00464-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Optimization Algorithms ◽

Cuckoo Search ◽

Naïve Bayes ◽

Bayesian Optimization ◽

Naive Bayes Classifier ◽

Grey Wolf ◽

Bayes Classifier ◽

Naïve Bayes Classifier

AbstractThe process of big data handling refers to the efficient management of storage and processing of a very large volume of data. The data in a structured and unstructured format require a specific approach for overall handling. The classifiers analyzed in this paper are correlative naïve Bayes classifier (CNB), Cuckoo Grey wolf CNB (CGCNB), Fuzzy CNB (FCNB), and Holoentropy CNB (HCNB). These classifiers are based on the Bayesian principle and work accordingly. The CNB is developed by extending the standard naïve Bayes classifier with applied correlation among the attributes to become a dependent hypothesis. The cuckoo search and grey wolf optimization algorithms are integrated with the CNB classifier, and significant performance improvement is achieved. The resulting classifier is called a cuckoo grey wolf correlative naïve Bayes classifier (CGCNB). Also, the performance of the FCNB and HCNB classifiers are analyzed with CNB and CGCNB by considering accuracy, sensitivity, specificity, memory, and execution time.

Download Full-text

Penerapan Data Mining dalam Menganalisis Kepribadian Pengguna Media Sosial dengan Naive Bayes Classifier: Studi Kasus Media Sosial Instagram

Jurnal Informatika ◽

10.24198/jin.v1i1.8552 ◽

2017 ◽

Vol 1 (1) ◽

pp. 11

Author(s):

Harits Muhammad ◽

R Sudrajat ◽

Rudi Rosadi

Keyword(s):

Data Mining ◽

Big Data ◽

Naive Bayes ◽

Factor Model ◽

Five Factor Model ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Big Data, Data Mining, Five Factor Model, Instagram, Naïve Bayes Classifier

Download Full-text

Integrating Data Mining Techniques for Naïve Bayes Classification: Applications to Medical Datasets

Computation ◽

10.3390/computation9090099 ◽

2021 ◽

Vol 9 (9) ◽

pp. 99

Author(s):

Pannapa Changpetch ◽

Apasiri Pitpeng ◽

Sasiprapa Hiriote ◽

Chumpol Yuangyai

Keyword(s):

Data Mining ◽

Association Rules ◽

Classification Accuracy ◽

Naive Bayes ◽

Classification Tree ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Association Rules Analysis

In this study, we designed a framework in which three techniques—classification tree, association rules analysis (ASA), and the naïve bayes classifier—were combined to improve the performance of the latter. A classification tree was used to discretize quantitative predictors into categories and ASA was used to generate interactions in a fully realized way, as discretized variables and interactions are key to improving the classification accuracy of the naïve Bayes classifier. We applied our methodology to three medical datasets to demonstrate the efficacy of the proposed method. The results showed that our methodology outperformed the existing techniques for all the illustrated datasets. Although our focus here was on medical datasets, our proposed methodology is equally applicable to datasets in many other areas.

Download Full-text