Lyric Text Mining Of Dangdut: Visualizing The Selected Words And Word Pairs Of The Legendary Rhoma Irama’s Dangdut Song In The 1970s Era

Tresna Maulana Fahrudin; Ali Ridho Barakbah

doi:10.29080/systemic.v4i2.432

Lyric Text Mining Of Dangdut: Visualizing The Selected Words And Word Pairs Of The Legendary Rhoma Irama’s Dangdut Song In The 1970s Era

Systemic Information System and Informatics Journal ◽

10.29080/systemic.v4i2.432 ◽

2018 ◽

Vol 4 (2) ◽

pp. 9-17

Author(s):

Tresna Maulana Fahrudin ◽

Ali Ridho Barakbah

Keyword(s):

Text Mining ◽

Data Extraction ◽

Way Of Life ◽

Network Graph ◽

Human Right ◽

Inverse Document Frequency ◽

Document Frequency ◽

Song Lyrics ◽

The Relationship ◽

Word Frequencies

Dangdut is a new genre of music introduced by Rhoma Irama, Indonesian popular musician who was the Legendary dangdut singer in the 1970s era until now. The expression of Rhoma Irama’s lyric has themes of the human being, the way of life, love, law and human right, tradition, social equality, and Islamic messages. But interestingly, the song lyrics were written by Rhoma Irama in the 1970s were mostly on the love song themes. In order to prove this, it is necessary to identify the songs through several approaches to explore the selected word and the relationship between word pairs. If each Rhoma Irama’s lyric is identified in text mining field, the lyric text extraction will be an interesting knowledge pattern. We collected the lyric from web were used as datasets, and then we have done the data extraction to store the component of lyric including the part and line of the song. We successfully applied the most word frequencies in the form of data visualization including bar chart, word cloud, term frequency-inverse document frequency, and network graph. As a results, several word pairs that often was used by Rhoma Irama in writing his song including heart-love (19 lines), heart-longing (13 lines), heart-beloved (12 lines), love-beloved (12 lines), love-longing (11 lines).

Download Full-text

Revealing Associations between Diagnosis Patterns and Acupoint Prescriptions Using Medical Data Extracted from Case Reports

Journal of Clinical Medicine ◽

10.3390/jcm8101663 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1663 ◽

Cited By ~ 5

Author(s):

Cheol-Han Kim ◽

Da-Eun Yoon ◽

Ye-Seul Lee ◽

Won-Mo Jung ◽

Joo-Hee Kim ◽

...

Keyword(s):

Network Analysis ◽

Case Reports ◽

Medical Data ◽

Medical Doctors ◽

Inverse Document Frequency ◽

Qualitative And Quantitative ◽

Bladder Disease ◽

Document Frequency ◽

The Relationship ◽

Disease Specific

Objective: The optimal acupoints for a particular disease can be determined by analysis of diagnosis patterns. The objective of this study was to reveal the association between such patterns and the acupoints prescribed in clinical practice using medical data extracted from case reports. Methods: This study evaluated online virtual diagnoses made by currently practicing Korean medical doctors (N = 80). The doctors were presented with 10 case reports published in Korean medical journals and were asked to diagnose the patients and prescribe acupoints accordingly. A network analysis and the term frequency-inverse document frequency (tf-idf) method were used to analyse and quantify the relationship between diagnosis patterns and prescribed acupoints. Results: The network analysis showed that ST36, LI4, LR3, and SP6 were the most frequently used acupoints across all diagnoses. The tf-idf values showed the acupoints used for specific diseases, such as BL40 for bladder disease and LU9 for lung disease. Conclusions: The associations between diagnosis patterns and prescribed acupoints were identified using an online virtual diagnosis modality. Network and text mining analyses revealed commonly applied and disease-specific acupoints in both qualitative and quantitative terms.

Download Full-text

The power of visual analytics and language processing to explore the underlying trend of highly popular song lyrics

Engineering and Applied Science Letters ◽

10.30538/psrp-easl2021.0072 ◽

2021 ◽

Vol 4 (3) ◽

pp. 19-29

Author(s):

Tanish Maheshwari ◽

◽

Tarpara Nisarg Bhaveshbhai ◽

Mitali Halder ◽

◽

...

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Visual Analytics ◽

High Rate ◽

Popular Song ◽

Inverse Document Frequency ◽

Document Frequency ◽

Song Lyrics ◽

Underlying Trend ◽

Very High

The number of songs are increasing at a very high rate around the globe. Out of the songs released every year, only the top few songs make it to the billboard hit charts .The lyrics of the songs place an important role in making them big hits combined with various other factors like loudness, liveness, speech ness, pop, etc. The artists are faced with the problem of finding the most desired topics to create song lyrics on. This problem is further amplified in selecting the most unique, catchy words which if added, could create more powerful lyrics for the songs. We propose a solution of finding the bag of unique evergreen words using the term frequency-inverse document frequency (TF-IDF) technique of natural language processing. The words from this bag of unique evergreen words could be added in the lyrics of the songs to create more powerful lyrics in the future.

Download Full-text

Analisis Sentimen Opini Pemindahan Ibu Kota Pada Twitter Dengan Metode Support Vector Machine

Jurnal Ilmu Komputer ◽

10.24843/jik.2021.v14.i01.p06 ◽

2021 ◽

Vol 14 (1) ◽

pp. 49

Author(s):

Tezza Fazar Tri Hidayat ◽

Garno Garno ◽

Azhari Ali Ridha

Keyword(s):

Support Vector Machine ◽

Text Mining ◽

Support Vector ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Relokasi ibu kota Indonesia kini telah diresmikan oleh Presiden Joko Widodo pada 26 Agustus 2019 ke Kalimantan, ini adalah sejarah baru dalam sejarah Indonesia karena belum pernah terjadi sebelumnya, sehingga memunculkan banyak pendapat atau tanggapan dari masyarakat. Analisis sentimen adalah kegiatan yang digunakan untuk menganalisis pendapat atau opini seseorang tentang suatu topik. Twitter adalah media sosial yang digunakan untuk mengekspresikan pendapat pengguna dan menyatukannya pada suatu topik. Support Vector Machine adalah metode text mining yang mencakup metode klasifikasi dan Term Frequency - Inverse Document Frequency adalah metode pembobotan karakter. SVM dan TF-IDF dapat digunakan untuk menganalisis sentimen opini publik tentang topik pemindahan ibukota Indonesia. Tujuan dari penelitian ini adalah untuk mengklasifikasikan opini publik tentang topik memindahkan Ibu Kota Indonesia dari ribuan tweet yang telah dikumpulkan dan disaring. Tweet pada dari 22-29 Maret 2020 telah diproses menjadi 992 tweet dan terdiri dari 221 data dengan label positif dan 771 data negatif. Dan menggunakan metode SVM yang memiliki akurasi 77,72% dan dikombinasikan dengan TFIDF yang meningkatkan akurasinya menjadi 78,33%.

Download Full-text

Improve the Accuracy of Support Vector Machine Using Chi Square Statistic and Term Frequency Inverse Document Frequency on Movie Review Sentiment Analysis

Scientific Journal of Informatics ◽

10.15294/sji.v6i1.14244 ◽

2019 ◽

Vol 6 (1) ◽

pp. 138-149

Author(s):

Ukhti Ikhsani Larasati ◽

Much Aziz Muslim ◽

Riza Arifudin ◽

Alamsyah Alamsyah

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Text Mining ◽

Sentiment Analysis ◽

Feature Weighting ◽

Support Vector ◽

Chi Square ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency

Data processing can be done with text mining techniques. To process large text data is required a machine to explore opinions, including positive or negative opinions. Sentiment analysis is a process that applies text mining methods. Sentiment analysis is a process that aims to determine the content of the dataset in the form of text is positive or negative. Support vector machine is one of the classification algorithms that can be used for sentiment analysis. However, support vector machine works less well on the large-sized data. In addition, in the text mining process there are constraints one is number of attributes used. With many attributes it will reduce the performance of the classifier so as to provide a low level of accuracy. The purpose of this research is to increase the support vector machine accuracy with implementation of feature selection and feature weighting. Feature selection will reduce a large number of irrelevant attributes. In this study the feature is selected based on the top value of K = 500. Once selected the relevant attributes are then performed feature weighting to calculate the weight of each attribute selected. The feature selection method used is chi square statistic and feature weighting using Term Frequency Inverse Document Frequency (TFIDF). Result of experiment using Matlab R2017b is integration of support vector machine with chi square statistic and TFIDF that uses 10 fold cross validation gives an increase of accuracy of 11.5% with the following explanation, the accuracy of the support vector machine without applying chi square statistic and TFIDF resulted in an accuracy of 68.7% and the accuracy of the support vector machine by applying chi square statistic and TFIDF resulted in an accuracy of 80.2%.

Download Full-text

Klasifikasi Artikel Ilmiah Dengan Berbagai Skenario Preprocessing

Sains, Aplikasi, Komputasi dan Teknologi Informasi ◽

10.30872/jsakti.v2i2.2681 ◽

2020 ◽

Vol 2 (2) ◽

pp. 70

Author(s):

Hidayatul Ma'rifah ◽

Aji Prasetya Wibawa ◽

Muhammad Iqbal Akbar

Keyword(s):

Text Mining ◽

Vector Space ◽

Cross Validation ◽

Confusion Matrix ◽

Vector Space Model ◽

Nearest Neighbour ◽

Inverse Document Frequency ◽

Space Model ◽

Document Frequency ◽

Fold Cross Validation

Penelitian ini bertujuan untuk menemukan kombinasi dan urutan preprocessing dalam text mining yang paling maksimal untuk klasifikasi bidang jurnal berbahasa Indonesia berdasarkan judul dan abstraknya. Tahap-tahap preprocessing yang akan diterapkan terdiri dari case folding, stemming, stopwords removal, transformasi VSM (Vector Space Model), dan SMOTE. Namun, pengamatan tiap skenario berfokus pada stemming dan dua teknik stopwords removal, yaitu stopwords removal berbasis kamus, dan berbasis document frequency setelah melewati proses transformasi ke dalam bentuk VSM dengan pembobotan TF-IDF (Term Trequency–Inverse Document Frequency). Proses klasifikasi mengadopsi algoritma k-NN (K-Nearest Neighbour), yang menentukan kelas suatu data tes dengan melihat tetangga terdekatnya. Dalam penelitian ini, metrik untuk menemukan jarak tetangga terdekat adalah Cosine Similarity. Pengujian klasifikasi menggunakan 10-Fold Cross Validation untuk menghasilkan confusion matrix sebagai hasil akhir. Kinerja klasifikasi terbaik dicapai dengan persentase accuracy sebesar 72.91% dan precision mencapai 73,36%.

Download Full-text

A Correlation Analysis of Construction Site Fall Accidents Based on Text Mining

Frontiers in Built Environment ◽

10.3389/fbuil.2021.690071 ◽

2021 ◽

Vol 7 ◽

Author(s):

Xixi Luo ◽

Quanlong Liu ◽

Zunxiang Qiu

Keyword(s):

Text Mining ◽

Construction Projects ◽

Strong Association ◽

Construction Site ◽

Causal Factors ◽

Safe Production ◽

Safety Technology ◽

Document Frequency ◽

The Relationship ◽

Fall Accidents

Construction site fall accidents are a high-frequency accident type in the construction industry and have received extensive attention from accident causal factor analysis and risk management research, but evaluating the relationship between accident causal factors and unstructured texts remains an area in urgent need of further study. In this paper, an analysis method based on text mining was chosen to analyze and process the collected data of 557 investigation reports of construction site fall accidents in China from 2013 to 2019. First, the accident reports were preprocessed to identify six types and 28 causal factors of fall accidents; subsequently, the 28 causal factors were classified into critical causal factors, subcritical causal factors and general causal factors according to their document frequency. Then, the Apriori algorithm was used to analyze the correlation of construction site fall accidents. Finally, strong association rules were obtained between the accident causal factors and between the causal factors and the types of construction site fall accidents. The results showed that 1) insufficient safety technology training and untimely elimination of hidden danger in safe production were the most frequent accident causal factors in fall accident reports. 2) There were different degrees of strong and weak correlations among the causal factors of construction site fall accidents, among which the higher the importance was, the stronger the correlation. 3) There were strong potential laws between the causal factors and the types of fall accidents, and the combination of some causal factors was most likely to lead to the occurrence of the corresponding accident types. This study scientifically and logically elucidated the inherent risk factors for fall accidents, which provides a theoretical basis for preventing fall accidents in construction projects.

Download Full-text

The Analysis of Proximity Between Subjects Based on Primary Contents Using Cosine Similarity on Lective

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v2i4.271 ◽

2017 ◽

pp. 299-308

Author(s):

Muhammad Andi Al-rizki ◽

Galih Wasis Wicaksono ◽

Yufis Azhar

Keyword(s):

Cosine Similarity ◽

High Similarity ◽

Inverse Document Frequency ◽

Term Frequency ◽

Research Results ◽

Document Frequency ◽

Precision And Accuracy ◽

The Relationship ◽

Similarity Method

In education world, recognizing the relationship between one subject and another is imperative. By recognizing the relationship between courses, performing sustainability mapping between subjects can be easily performed. Moreover, detecting and reducing any duplicated contents in several subjects will be also possible to execute. Of course, these conveniences will benefit lecturers, students and departments. It will ease the analysis and discussion processes between lecturers related to subjects in the same domain. In addition, students will conveniently choose a group of subjects they are interested in. Furthermore, departments can easily create a specialization group based on the similarity of the subjects and combine the courses possessing high similarity. In this research, given a good database, the relationship between subjects was calculated based on the proximity of the primary contents of the subjects. The feature used was term feature, in which value was determined by calculating TF-IDF (Term Frequency Inverse Document Frequency) from each term. In recognizing the value of proximity between subjects, cosine similarity method was implemented. Finally, testing was done utilizing precision, recall and accuracy method. The research results show that the precision and accuracy values are 90,91% and the recall value is 100%.

Download Full-text

ANALISA TESTIMONIAL DENGAN MENGGUNAKAN ALGORITMA TEXT MINING DAN TERM FREQUENCY- INVERSE DOCUMENT FREQUENCE (TF-IDF) PADA TOKO ALLMEEART

KOMIK (Konferensi Nasional Teknologi Informasi dan Komputer) ◽

10.30865/komik.v3i1.1697 ◽

2019 ◽

Vol 3 (1) ◽

Author(s):

Meylita Putri Simatupang ◽

Dito Putro Utomo

Keyword(s):

Text Mining ◽

Inverse Document Frequency ◽

Term Frequency ◽

Mining Algorithm ◽

Document Frequency ◽

Positive Experiences

E-commerce or often referred to as an online shop is the latest trend of the community in carrying out shopping activities, first before the rise of e-commerce companies like today the community to meet their needs still rely on distros around the customer lives, or to a shopping place but now it has switch to shoop online. The advantages offered by online shoop are the relatively low prices, no need to shop locations, and guarantee goods, it has an impact on retail shops that are increasingly lonely. Testimonials are one of the techniques carried out to convince customers to shop at e-commerce they have, testimonials are the responses of buyers for their experience of shopping in an e-commerce application starting from the payment process until the goods are received, the more positive experiences conveyed in the testimonials, the customer who have not shopped on an e-commerce application will be more convinced to shop. Testimonials on an e-commerce application are not always positive, there are times when testimonials are delivered by negative buyers. The customer's problem is the unavailability of percentages or information on the number of buyers with positive and negative shopping experiences because in general testimonials are only delivered in the form of a list.Keywords: Testimonial Analysis, Text Mining Algorithm, Term Frequency-Inverse Document Frequency (TF-IDF)

Download Full-text

APLIKASI INFORMATION RETRIEVAL UNTUK PENCARIAN DOKUMEN LAPORAN PENELITIAN

Jurnal Informatika Polinema ◽

10.33795/jip.v1i3.109 ◽

2017 ◽

Vol 1 (3) ◽

pp. 23

Author(s):

Indri Tri Hapsari ◽

Banni Satria Andoko ◽

Cahya Rahmad

Keyword(s):

Information Retrieval ◽

Text Mining ◽

Inverse Document Frequency ◽

Document Frequency

Information retrieval atau temu kembali informasi merupakan sistem pencarian untuk menemukan kembali sebuah informasi. Penelitian ini bertujuan untuk merancang dan mengimplementasikan sistem pencarian dokumen laporan penelitian sehingga dapat mempermudah dalam menemukan kembali dokumen yang diinginkan oleh pengguna. Text mining digunakan untuk mengolah teks atau preprocessing didalam dokumen sebagai kata kunci dan perhitungan termfrequency – inverse document frequency (TF-IDF) sebagai metode pembobotan setiap kata dalam dokumen sesuai dengan kata kunci yang diinputkan pengguna. TF-IDF dipengaruhi oleh frekuensi kemunculan kata pada sebuah dokumen dan frekuensi dari dokumen yang memiliki kata tersebut sehingga jika diimplementasikan sistem ini dapat menemukan kembali informasi dari dokumen laporan penelitian yang disimpan secara cepat dan efisien, serta dari hasil pencarian dapat diurutkan berdasarkan bobot informasinya.Hasil dari penelitian ini menunjukkan bahwa pembobotan kata dengan menggunakan TFIDF dapat me-retrieve dokumen yang relevan dengan query masukan pengguna.

Download Full-text

Ethical Consumers’ Awareness of Vegan Materials: Focused on Fake Fur and Fake Leather

Sustainability ◽

10.3390/su13010436 ◽

2021 ◽

Vol 13 (1) ◽

pp. 436

Author(s):

Yeong-Hyeon Choi ◽

Kyu-Hye Lee

Keyword(s):

Animal Rights ◽

Keyword Search ◽

Animal Abuse ◽

Ethical Awareness ◽

Inverse Document Frequency ◽

Animal Protection ◽

Ethical Consumers ◽

Search Volume ◽

Document Frequency ◽

The Relationship

With an increase in ethical awareness, people have begun to criticize the unethical issues associated with the use of animal materials. This study focused on the transition of global consumers’ awareness toward vegan materials and the relationship between the interest in ethical subjects such as animals, the environment, and vegan materials. For this purpose, consumers’ posts about fur/fake fur and leather/fake leather uploaded on Google and Twitter from 2008 to 2019 were utilized, and the Term Frequency-Inverse Document Frequency (Tf-idf) value was extracted using Python 3.7. Furthermore, the worldwide Google keyword search volume of each word was analyzed using Smart PLS 3.0 to investigate global consumers’ awareness. First, with time, consumers began relating animal materials such as fur and leather to topics such as animal rights, animal abuse, and animal protection. Second, as interest in “animal welfare” increased, interest in “fake fur” also rose, and as interest in “cruelty free” increased, interest in “fake fur”, “vegan fur”, and “vegan leather” also increased. Third, as consumers’ interest in the “environment” increased, interest in vegan materials such as “fake fur” and “fake leather” decreased. However, as interest in “eco” increased, interest in “vegan leather” also augmented.

Download Full-text