scholarly journals PENERAPAN ANALISIS SENTIMEN PADA PENGGUNA TWITTER MENGGUNAKAN METODE K-NEAREST NEIGHBOR

2018 ◽  
Vol 3 (1) ◽  
pp. 1 ◽  
Author(s):  
Akhmad Deviyanto ◽  
Muhammad Didik Rohmad Wahyudi

AbstractThis research is made to implement the KNN (K-Nearest Neighbor) algorithm for sentiment analysis Twitter about Jakarta Governor Election 2017. The object is 2000 data tweets in Indonesia collected from Twitter during Januari 2017 using Python package called Twitterscraper. The methode used in sentiment analysis system is KNN with TF-IDF term weighting and Cosine similarity measure. As the test result, the highest accuracy is 67,2% when k=5, the highest precision is 56,94% with k=5, and the highest recall 78,24% with k=15.Keywords : K – Nearest Neighbor, Twitterscraper, TF-IDF, Cosine Similarity Penelitian ini dibuat untuk mengimplementasikan algoritma KNN (K - Nearest Neighbor) dalam analisis sentimen pengguna Twitter tentang topik Pilkada DKI 2017. Data tweet yang digunakan adalah sebanyak 2000 data tweet berbahasa Indonesia yang dikumpulkan selama bulan Januari 2017 menggunakan package Python bernama Twitterscraper. Menggunakan algoritma KNN dengan pembobotan kata TF-IDF dan fungsi Cosine Similarity, akan dilakukan pengklasifikasian nilai sentimen ke dalam dua kelas : positif dan negatif. Dari hasil pengujian diketahui bahwa nilai akurasi terbesar adalah 67,2% ketika k=5, presisi tertinggi 56,94% ketika k=5, dan recall 78,24% dengan k=15.Kata Kunci : K – Nearest Neighbor, Twitterscraper, TF-IDF, Cosine Similarity

2021 ◽  
Vol 6 (1) ◽  
pp. 96
Author(s):  
Ikhsan Romli ◽  
Shanti Prameswari R ◽  
Antika Zahrotul Kamalia

Sentiment analysis is a data processing to recognize topics that people talk about and their sentiments toward the topics, one of which in this study is about large-scale social restrictions (PSBB). This study aims to classify negative and positive sentiments by applying the K-Nearest Neighbor algorithm to see the accuracy value of 3 types of distance calculation which are cosine similarity, euclidean, and manhattan distance for Indonesian language tweets about large-scale social restrictions (PSBB) from social media twitter. With the results obtained, the K-Nearest Neighbor accuracy by the Cosine Similarity distance 82% at k = 3, K-Nearest Neighbor by the Euclidean Distance with an accuracy of 81% at k = 11 and K-Nearest Neighbor by Manhattan Distance with an accuracy 80% at k = 5, 7, 9, 11, and 13. So, in this study the K-Nearest Neighbor algorithm with the Cosine Similarity Distance calculation gets the highest point.


2020 ◽  
Vol 8 (4) ◽  
pp. 367
Author(s):  
Muhammad Arief Budiman ◽  
Gst. Ayu Vida Mastrika Giri

The development of the music industry is currently growing rapidly, millions of music works continue to be issued by various music artists. As for the technologies also follows these developments, examples are mobile phones applications that have music subscription services, namely Spotify, Joox, GrooveShark, and others. Application-based services are increasingly in demand by users for streaming music, free or paid. In this paper, a music recommendation system is proposed, which the system itself can recommend songs based on the similarity of the artist that the user likes or has heard. This research uses Collaborative Filtering method with Cosine Similarity and K-Nearest Neighbor algorithm. From this research, a system that can recommend songs based on artists who are related to one another is generated.


JOUTICA ◽  
2021 ◽  
Vol 6 (2) ◽  
pp. 506
Author(s):  
Mustain Mustain Mustain

Kesulitan untuk mengorganisir data kuesioner yang bersifat konvensional melatarbelakangi penelitian ini. Oleh karena itu dibuat sistem yang memudahkan pengelompokan data kuesioner secara otomatis yang lengkap dengan sentimen yang terkandung didalamnya. Dataset yang digunakan dalam penelitian ini adalah data kuesioner rumah sakit Muhammadiyah lamongan. Penelitian ini hanya menangani kuesioner yang berbentuk teks. Data dengan fisik kertas direkap kemudian diinput ke database lengkap dengan kategori unit kerja dan sentiment. Selanjutnya dataset tersebut di dilakukan pre-prosesing yang meliputi penanganan negasi case folding, tokenizing, filtering dan stemming. Sebagai data uji komentar dari kuesioner akan dilakukan pre-prosesing selanjutnya dihitung tingkat kemiripan document dengan menggunakan metode K- Nearest Neighbor dan Vector Space Model. Jumlah data yang ditangani mempengaruhi performa system terutama dari akurasi dan kecepatan pada saat proses klasifikasi. Hasil dari sistem yang dibuat berupa ranking dokumen yang paling mirip dengan dataset berdasarkan urutan nilai cosine similarity. Ujicoba klasifikasi berdasarkan kelas kategori menghasilkan nilai akurasi 91 %. Ujicoba berdasarkan Kelas Sentimen sebesar 94 %.dari kombinasi keduanya system berhasil mendapat akurasi sebesar 86 %


Author(s):  
Danny Sebastian

E-marketplace has gained popularity with the Indonesian society resulting in the increment of products offered. Consequently, customers require more effort to search for products. In this study, we classified products from several e-marketplaces. The classification was carried out using TF-IDF method for the weighting, cosine similarity to calculate product similarity distance, and k-nearest neighbor algorithm. Based on the first testing result using 150 product data, the k-nearest neighbor method with k=5 successfully classified 146 data with 4 data classified into the wrong class. This k=5 value gives the best result for this case, with an accuracy of 97.33%. The second testing result using 150 mixed brand product data, the k-nearest neighbor method successfully classified 145 data with 5 data classified into the wrong class. The accuracy of the second testing is 96.67%.


2021 ◽  
Vol 1 (1) ◽  
pp. 1-12
Author(s):  
Aytuğ Onan ◽  

With the advancement of information and communication technology, social networking and microblogging sites have become a vital source of information. Individuals can express their opinions, grievances, feelings, and attitudes about a variety of topics. Through microblogging platforms, they can express their opinions on current events and products. Sentiment analysis is a significant area of research in natural language processing because it aims to define the orientation of the sentiment contained in source materials. Twitter is one of the most popular microblogging sites on the internet, with millions of users daily publishing over one hundred million text messages (referred to as tweets). Choosing an appropriate term representation scheme for short text messages is critical. Term weighting schemes are critical representation schemes for text documents in the vector space model. We present a comprehensive analysis of Turkish sentiment analysis using nine supervised and unsupervised term weighting schemes in this paper. The predictive efficiency of term weighting schemes is investigated using four supervised learning algorithms (Naive Bayes, support vector machines, the k-nearest neighbor algorithm, and logistic regression) and three ensemble learning methods (AdaBoost, Bagging, and Random Subspace). The empirical evidence suggests that supervised term weighting models can outperform unsupervised term weighting models.


Kilat ◽  
2019 ◽  
Vol 8 (1) ◽  
Author(s):  
Riki Ruli A. Siregar ◽  
Zuhdiyyah Ulfah Siregar ◽  
Rakhmat Arianto

The process of analyzing and classifying comment data done by reading and sorting one by one negative comments and classifying them one by one using Ms. Excel not effective if the data to be processed in large quantities. Therefore, this study aims to apply sentiment analysis on comment data using K-Nearest Neighbor (KNN) method. The comment data used is the comments of the participants of the training on Udiklat Jakarta filled by each participant who followed the training. Furthermore, the comment data is processed by pre-processing, weighting the word using Term Frequency-Invers Document Frequency, calculating the similarity level between the training data and test data with cosine similarity. The process of applying sentiment analysis is done to determine whether the comment is positive or negative. Furthermore, these comments will be classified into four categories, namely: instructors, materials, facilities and infrastructure. The results of this study resulted in a system that can classify comment data automatically with an accuracy of 94.23%


2018 ◽  
Vol 6 (2) ◽  
pp. 106
Author(s):  
Difari Afreyna Fauziah ◽  
Achmad Maududie ◽  
Ifrina Nuritha

Klasifikasi konten berita politik menggunakan algoritma K-Nearest Neighbor merupakan suatu proses untuk mengklasifikasikan berita politik ke dalam tiga subkategori yang lebih spesifik yaitu pilkada, UU ORMAS dan reshuffle kabinet. Algoritma yang digunakan dalam penelitian ini adalah algoritma K-Nearest Neighbor. Algoritma K-Nearest Neighbor merupakan suatu pendekatan klasifikasi yang mencari semua data training yang paling relatif mirip atau memiliki jarak yang paling dekat dengan data testing. Algoritma ini dipilih karena K-Nearest Neighbor merupakan algoritma yang sederhana dengan mencari kategori mayoritas sebanyak nilai K yang telah ditentukan sebelumnya. nilai K yang digunakan pada penelitian ini adalah K=3, K=5, K=7 dan K=9. Mekanisme dari sistem klasifikasi konten berita ini dimulai dengan tahap preprocessing. Berita politik yang dimasukkan kedalam sistem akan melewati empat tahap preprocessing yaitu case folding, tokenizing, stopword dan stemming. Tahap selanjutnya yaitu tahap pembobotan term. Pembobotan atau term weighting merupakan proses mendapatkan nilai term yang berhasil diekstrak dari proses sebelumnya yaitu proses preprocessing. Algoritma yang digunakan untuk tahap pembobotan pada penelitian ini adalah algoritma TFIDF. Setelah didapatkan nilai dari bobot term, kemudian dicari nilai jarak antar dokumen menggunakan algoritma cosine similarity. Langkah berikutnya adalah melakukan pengurutan data dalam data training berdasarkan hasil perhitungan nilai jarak. Selanjutnya, dari hasil pengurutan tersebut diambil sejumlah K data yang memiliki nilai kedekatan. Tujuan dari penelitian ini adalah sistem mampu mengimplementasikan algoritma KNN pada dokumen yang memiliki similarity yang tinggi. Pada penelitian ini dilakukan 3 pengujian dengan tiga variasi dataset yang berbeda dengan empat nilai K. Hasil akurasi yang terbaik didapatkan ketika sistem menggunakan nilai K=9 yang menunjukkan nilai precision sebesar 100%, recall sebesar 100% dan nilai f-measure sebesar 100%. Kata Kunci: klasifikasi, algoritma K-Nearest Neighbor, TFIDF, cosine similarity, confusion matrix.


Sign in / Sign up

Export Citation Format

Share Document