Perbandingan Klasifikasi dengan Pendekatan Pembelajaran Mesin untuk Mengidentifikasi Tweet Hoaks di Media Sosial Twitter

Shanto Moyrano Tambunan; Yessica Nataliani; Elizabeth Sri Lestari

doi:10.26418/jp.v7i2.47232

Perbandingan Klasifikasi dengan Pendekatan Pembelajaran Mesin untuk Mengidentifikasi Tweet Hoaks di Media Sosial Twitter

Jurnal Edukasi dan Penelitian Informatika (JEPIN) ◽

10.26418/jp.v7i2.47232 ◽

2021 ◽

Vol 7 (2) ◽

pp. 112

Author(s):

Shanto Moyrano Tambunan ◽

Yessica Nataliani ◽

Elizabeth Sri Lestari

Keyword(s):

Random Forest ◽

Gradient Descent ◽

Naive Bayes ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent

Perkembangan teknologi tidak luput dari dampak negatif, salah satunya hoaks. Twitter menjadi salah satu media sosial yang paling aktif digunakan sebagai pertukaran informasi, komunikasi, dan hiburan. Oleh karena itu pengguna Twitter dapat menyebarkan berita atau hoaks dengan mudah. Penelitian ini bertujuan mengidentifikasi tweet yang berisi informasi hoaks maupun valid menggunakan pembelajaran mesin. Algoritma yang digunakan adalah Stochastic Gradient Descent, Naïve Bayes, Random Forest, dan Rocchio. Keempat algoritma tersebut dibandingkan untuk kemudian dicari hasil terbaik dalam mengidentifikasi dan memverifikasi tweet di Twitter yang berisi hoaks atau informasi valid secara otomatis. Kata kunci yang digunakan adalah Corona, Mutasi Corona, PSBB, Dana Bansos, Dana Otsus, Utang Pemerintah, dan Sekolah Tatap Muka sebanyak 898 tweet. Data dikelompokkan berdasarkan kelas hoaks dan valid lalu diolah menjadi dataset dengan melewati tahap pra-proses hingga pembobotan kata dengan TF-IDF. Hasil pengujian menunjukkan algoritma Stochastic Gradient Descent merupakan algoritma terbaik dengan hasil akurasi rata-rata sebesar 84.92%. Pengujian lanjutan dilakukan dengan menghitung nilai presisi, recall, dan F-1. Hasil presisi terbaik sebesar 82.95% pada algoritma Naïve Bayes, sedangkan hasil recall dan F-1 terbaik didapat dari algoritma Stochastic Gradient Descent sebesar 85.05% dan 82.42%.

Download Full-text

Perbandingan Teknik Klasifikasi Dalam Data Mining Untuk Bank Direct Marketing

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.201855958 ◽

2018 ◽

Vol 5 (5) ◽

pp. 567 ◽

Cited By ~ 2

Author(s):

Irvi Oktanisa ◽

Ahmad Afif Supianto

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Random Forest ◽

Gradient Descent ◽

Naive Bayes ◽

Direct Marketing ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector

Klasifikasi merupakan teknik dalam data mining untuk mengelompokkan data berdasarkan keterikatan data terhadap data sampel. Pada penelitian ini, kami melakukan perbandingan 9 teknik klasifikasi untuk mengklasifikasi respon pelanggan pada dataset Bank Direct Marketing. Perbandingan teknik klasifikasi ini dilakukan untuk mengetahui model dalam teknik klasfikasi yang paling efektif untuk mengklasifikasi target pada dataset Bank Direct Marketing. Teknik klasifikasi yang digunakan yaitu Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, dan CN2 Rule. Proses klasifikasi diawali dengan preprocessing data untuk melakukan penghilangan missing value dan pemilihan fitur pada dataset. Pada tahap evaluasi digunakan teknik 10 fold cross validation. Setelah dilakukan pengujian, didapatkan bahwa hasil klasifikasi menunjukkan akurasi terbaik diperoleh oleh model Tree, Constant, Naive Bayes, dan Stochastic Gardient Descent. Kemudian diikuti oleh model Random Forest, K-Nearest Neighbor, CN-2 Rule, AdaBoost dan Support Vector Machine. Dari keempat model yang menunjukkan hasil akurasi terbaik, untuk kasus ini Stochastic Gradient Descent terpilih sebagai model yang memiliki akurasi terbaik dengan nilai akurasi sebesar 0,972 dan hasil visualisasi yang dihasilkan lebih jelas untuk mengklasifikasi target pada dataset Bank Direct Marketing. AbstractClassification is a technique in data mining to classify data based on the attachment of data to the sample data.. In this paper, we present the comparison of 9 classification techniques performed to classify customer response on the dataset of Bank Direct Marketing. The techniques performed to find out the effectiveness model in the classification technique used to classify targets on the dataset of Bank Direct Marketing. The techniques used are Support Vector Machine, AdaBoost, Naïve Bayes, Constant, KNN, Tree, Random Forest, Stochastic Gradient Descent, and CN2 Rule. The classification process begins with preprocessing data to perform missing value omissions and feature selection on the dataset. Cross validation technique, with k value is 10, used in the evaluation stage. After testing, it was found that the classification results showed the best accuracy obtained when using the Tree model, Constant, Naive Bayes and Stochastic Gradient Descent. Afterwards the Random Forest model, K-Nearest Neighbor, CN-2 Rule, AdaBoost, and Support Vector Machine are followed. Of the four models with the high accuracy results, in this case Stochastic Gradient Descent was selected as the best accuracy model with an accuracy value of 0.972 and resulting visualization more clearly to classify targets on the dataset of Bank Direct Marketing.

Download Full-text

Perbandingan Prediksi Kualitas Kopi Arabika dengan Menggunakan Algoritma SGD, Random Forest dan Naive Bayes

EDUMATIC Jurnal Pendidikan Informatika ◽

10.29408/edumatic.v4i2.2202 ◽

2020 ◽

Vol 4 (2) ◽

pp. 1-9

Author(s):

Veronica Sari ◽

◽

Feranandah Firdausi ◽

Yufis Azhar ◽

◽

...

Keyword(s):

Random Forest ◽

Gradient Descent ◽

Cross Validation ◽

Naive Bayes ◽

Area Under The Curve ◽

Naïve Bayes ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Quality Institute ◽

Fold Cross Validation

Classification is one of the techniques that exist in data mining and is useful for grouping a data based on the attachment of the data with the sample data. The dataset that is used in this study is the coffee dataset taken from Dataset Coffee Quality Institute on the GitHub platform. The attributes that contained in the dataset are Aroma, Aftertaste, Flavor, Acidity, Balance, Body, Uniformity, Sweetness, Clean Cup, and Copper points. There are 3 classification methods that are used in this study, Stochastic Gradient Descent, Random Forest and Naive Bayes. The aim of this study is to find out which algorithm is the most effective to predict the coffee quality in the dataset. After that, the prediction results will be tested using K-Fold Cross Validation and Area Under the Curve (AUC) method. The results show that Stochastic Gradient Descent obtained the best accuracy results compared to the other two methods with an accuracy of 98% and increased to 99% after tested using K-fold Cross Validation and AUC method.

Download Full-text

Hybrid models for suspended sediment prediction: optimized random forest and multi-layer perceptron through genetic algorithm and stochastic gradient descent methods

Neural Computing and Applications ◽

10.1007/s00521-021-06550-1 ◽

2021 ◽

Author(s):

Saeed Samadianfard ◽

Katayoun Kargar ◽

Sadra Shadkani ◽

Sajjad Hashemi ◽

Akram Abbaspour ◽

...

Keyword(s):

Genetic Algorithm ◽

Random Forest ◽

Suspended Sediment ◽

Gradient Descent ◽

Hybrid Models ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Descent Methods ◽

Multi Layer Perceptron ◽

Gradient Descent Methods

Download Full-text

ANALISA DAN PREDIKSI IKLAN LOWONGAN KERJA PALSU DENGAN METODE NATURAL LANGUAGE PROGRAMING DAN MACHINE LEARNING

Jurnal Informatika ◽

10.30873/ji.v21i1.2865 ◽

2021 ◽

Vol 21 (1) ◽

pp. 14-22

Author(s):

Hary Sabita ◽

Fitria Fitria ◽

Riko Herwanto

Keyword(s):

Machine Learning ◽

Gradient Descent ◽

Naive Bayes ◽

Group Discussion ◽

Naïve Bayes ◽

Stochastic Gradient Descent ◽

Baseline Model ◽

Bayes Model ◽

The Us ◽

Better Than

This research was conducted using the data provided by Kaggle. This data contains features that describe job vacancies. This study used location-based data in the US, which covered 60% of all data. Job vacancies that are posted are categorized as real or fake. This research was conducted by following five stages, namely: defining the problem, collecting data, cleaning data (exploration and pre-processing) and modeling. The evaluation and validation models use Naïve Bayes as a baseline model and Small Group Discussion as end model. For the Naïve Bayes model, an accuracy value of 0.971 and an F1-score of 0.743 is obtained. While the Stochastic Gradient Descent obtained an accuracy value of 0.977 and an F1-score of 0.81. These final results indicate that SGD performs slightly better than Naïve Bayes.Keywords—NLP, Machine Learning, Naïve Bayes, SGD, Fake Jobs

Download Full-text

Optimizing Spam Detection in Twitter by Using Naïve Bayes, Logistic Regression and Stochastic Gradient Descent with Whale Optimization Algorithm and Genetic Algorithm

Journal of Xi'an University of Architecture & Technology ◽

10.37896/jxat12.03/225 ◽

2020 ◽

Vol XII (III) ◽

Keyword(s):

Genetic Algorithm ◽

Logistic Regression ◽

Optimization Algorithm ◽

Gradient Descent ◽

Naive Bayes ◽

Stochastic Gradient ◽

Whale Optimization Algorithm ◽

Stochastic Gradient Descent ◽

Spam Detection ◽

Whale Optimization

Download Full-text

Linear Support Vector Machine (SVM) with Stochastic Gradient Descent (SGD) training and multinomial Nave Bayes (NB) in News Classification

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i4.360363 ◽

2019 ◽

Vol 7 (4) ◽

pp. 360-363

Author(s):

Feroz Ahmed ◽

Shabina Ghafir

Keyword(s):

Support Vector Machine ◽

Gradient Descent ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Linear Support Vector Machine

Download Full-text

Stochastic gradient descent training for L1-regularized log-linear models with cumulative penalty

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - ACL-IJCNLP '09 ◽

10.3115/1687878.1687946 ◽

2009 ◽

Cited By ~ 45

Author(s):

Yoshimasa Tsuruoka ◽

Jun'ichi Tsujii ◽

Sophia Ananiadou

Keyword(s):

Gradient Descent ◽

Linear Models ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Log Linear

Download Full-text

Drivetrain System Identification in a Multi-Task Learning Strategy using Partial Asynchronous Elastic Averaging Stochastic Gradient Descent

2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM) ◽

10.1109/aim43001.2020.9158977 ◽

2020 ◽

Author(s):

Tom Staessens ◽

Guillaume Crevecoeur

Keyword(s):

System Identification ◽

Gradient Descent ◽

Learning Strategy ◽

Stochastic Gradient ◽

Stochastic Gradient Descent ◽

Task Learning

Download Full-text

Application of GA Feature Selection on Naive Bayes, Random Forest and SVM for Credit Card Fraud Detection

2020 International Conference on Decision Aid Sciences and Application (DASA) ◽

10.1109/dasa51403.2020.9317228 ◽

2020 ◽

Author(s):

Yakub K. Saheed ◽

Moshood A. Hambali ◽

Micheal O. Arowolo ◽

Yinusa A. Olasupo

Keyword(s):

Feature Selection ◽

Random Forest ◽

Credit Card ◽

Naive Bayes ◽

Fraud Detection ◽

Naïve Bayes ◽

Credit Card Fraud

Download Full-text

Performance of SMOTE in a random forest and naive Bayes classifier for imbalanced Hepatitis-B vaccination status

Journal of Physics Conference Series ◽

10.1088/1742-6596/1863/1/012073 ◽

2021 ◽

Vol 1863 (1) ◽

pp. 012073

Author(s):

V M Putri ◽

M Masjkur ◽

C Suhaeni

Keyword(s):

Random Forest ◽

Hepatitis B ◽

Naive Bayes ◽

Naïve Bayes ◽

Vaccination Status ◽

Hepatitis B Vaccination ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier

Download Full-text