scholarly journals Dialogue Act Classification, Instance-Based Learning, and Higher Order Dialogue Structure

2010 ◽  
Vol 1 (2) ◽  
pp. 1-24 ◽  
Author(s):  
Barbara Di Eugenio ◽  
Zhuli Xie ◽  
Riccardo Serafin

In this paper, we explore instance-based learning methods for dialogue act classification on two corpora, MapTask and CallHome Spanish. We start with Latent Semantic Analysis (LSA), and extend it as Feature Latent Semantic Analysis (FLSA). FLSA adds richer linguistic features to LSA, which only uses words. In particular, we explore the extended dialogue context, both linearly (the previous dialogue act) and hierarchically (conversational games). We show how the k-Nearest Neighbor algorithm obtains its best results when applied to the reduced semantic spaces generated by FLSA. Empirically, our results are better than previously published results on these two corpora; linguistically, we confirm and extend previous observations that the hierarchical dialogue structure encoded via the notion of Game is of primary importance for dialogue act recognition.

2021 ◽  
Vol 27 (1) ◽  
pp. 88-112
Author(s):  
Ekaterina I. DYUDIKOVA ◽  
Natal'ya N. KUNITSYNA

Subject. The digital economy emerged as a new generation of financial instruments, such as cryptocurrencies, were invented and proliferated, which were able to counteract global challenges. Those who oppose to the legitimization of digital assets and their integration into the payment infrastructure do not point out material advantages and support drastic transformations of the existing financial system. However, assuming very risky digital payments, the scope of cruptocurrency still grows. The article presents the outcome of intellectual text analysis of feedback left by users of electronic banking and digital cryptocurrency systems. Doing so, we determined to what extent they are satisfied with various systems. Objectives. The study is intended to provide the theoretical and methodological rationale for, and practically test the model that determines key themes in analyzable non-structured big data and allows to automatically evaluate the satisfaction of users with various payment systems. Methods. We resorted to the formal logic, systems approach, methods of comparative analysis, text mining and latent semantic analysis. Results. We analyzed reviews uploaded to www.banki.ru and www.otzovik.ru through parsing, stop word elimination, stemming, probabilistic thematic modeling based on the latent semantic analysis. We assessed to what extent users are satisfied with various systems by examining their reviews through the text tone analysis, the k-nearest neighbor algorithm and automated scoring of unrated reviews. Conclusions and Relevance. Text mining of unstructured big data shows that digital platforms, notwithstanding their infancy and high risks, already mostly satisfy social needs as compared to electronic banking systems, which determines the reasonableness of integrating them into the payment system to unlock their potential.


2021 ◽  
pp. 1-16
Author(s):  
Sunil Kumar Jha ◽  
Ninoslav Marina ◽  
Jinwei Wang ◽  
Zulfiqar Ahmad

Machine learning approaches have a valuable contribution in improving competency in automated decision systems. Several machine learning approaches have been developed in the past studies in individual disease diagnosis prediction. The present study aims to develop a hybrid machine learning approach for diagnosis predictions of multiple diseases based on the combination of efficient feature generation, selection, and classification methods. Specifically, the combination of latent semantic analysis, ranker search, and fuzzy-rough-k-nearest neighbor has been proposed and validated in the diagnosis prediction of the primary tumor, post-operative, breast cancer, lymphography, audiology, fertility, immunotherapy, and COVID-19, etc. The performance of the proposed approach is compared with single and other hybrid machine learning approaches in terms of accuracy, analysis time, precision, recall, F-measure, the area under ROC, and the Kappa coefficient. The proposed hybrid approach performs better than single and other hybrid approaches in the diagnosis prediction of each of the selected diseases. Precisely, the suggested approach achieved the maximum recognition accuracy of 99.12%of the primary tumor, 96.45%of breast cancer Wisconsin, 94.44%of cryotherapy, 93.81%of audiology, and significant improvement in the classification accuracy and other evaluation metrics in the recognition of the rest of the selected diseases. Besides, it handles the missing values in the dataset effectively.


2020 ◽  
Vol 7 (1) ◽  
pp. 140
Author(s):  
Dian Chusnul Hidayati ◽  
Said Al Faraby ◽  
Adiwijaya Adiwijaya

Hadith is the second source of Islamic law after Al-Quran, making it important to study. However, there are some difficulties in learning hadith, such as to determine which hadith belongs to the topic of suggestions, prohibitions, and information. This certainly obstructs the hadith learning process, especially for Muslims. Therefore, it is necessary to classify hadiths into the topic of suggestions, prohibitions, information, and a combination of the three topics which also called as multi-label topic. The classification can be done with the K-Nearest Neighbor, it is one of the best methods in the Vector Space Model and is the simplest but quite effective method. However, the KNN has a lack in dealing with high vector dimension, resulting in the long time computing classification. For that reason, it is necessary to classify Sahih Bukhari's Hadiths into the topic of recommendations, prohibitions, and information using the Latent-Semantic Analysis - K-nearest Neighbor (LSA-KNN) method. Binary Relevance method is also employed in this research to process the multi-label data. This research shows that the performance of LSA-KNN is 90.28% with the computation time is 19 minutes 21 seconds and the performance of KNN is 90.23% with the computation time is 37 minutes 06 seconds, which means that the LSA-KNN method has a better performance than KNN


2021 ◽  
Vol 11 (2) ◽  
pp. 848
Author(s):  
Athita Onuean ◽  
Hanmin Jung ◽  
Krisana Chinnasarn

Air quality monitoring network (AQMN) plays an important role in air pollution management. However, setting up an initial network in a city often lacks necessary information such as historical pollution and geographical data, which makes it challenging to establish an effective network. Meanwhile, cities with an existing one do not adequately represent spatial coverage of air pollution issues or face rapid urbanization where additional stations are needed. To resolve the two cases, we propose four methods for finding stations and constructing a network using Euclidean distance and the k-nearest neighbor algorithm, consisting of Euclidean Distance (ED), Fixed Surrounding Sphere (FSS), Euclidean Distance + Fixed Surrounding Sphere (ED + FSS), and Euclidean Distance + Adjustable Surrounding Sphere (ED + ASS). We introduce and apply a coverage percentage and weighted coverage degree for evaluating the results from our proposed methods. Our experiment result shows that ED + ASS is better than other methods for finding stations to enhance spatial coverage. In the case of setting up the initial networks, coverage percentages are improved up to 22%, 37%, and 56% compared with the existing network, and adding a station in the existing one improved up by 34%, 130%, and 39%, in Sejong, Bonn, and Bangkok cities, respectively. Our method depicts acceptable results and will be implemented as a guide for establishing a new network and can be a tool for improving spatial coverage of the existing network for future expansions in air monitoring.


2015 ◽  
Vol 1 (4) ◽  
pp. 270
Author(s):  
Muhammad Syukri Mustafa ◽  
I. Wayan Simpen

Penelitian ini dimaksudkan untuk melakukan prediksi terhadap kemungkian mahasiswa baru dapat menyelesaikan studi tepat waktu dengan menggunakan analisis data mining untuk menggali tumpukan histori data dengan menggunakan algoritma K-Nearest Neighbor (KNN). Aplikasi yang dihasilkan pada penelitian ini akan menggunakan berbagai atribut yang klasifikasikan dalam suatu data mining antara lain nilai ujian nasional (UN), asal sekolah/ daerah, jenis kelamin, pekerjaan dan penghasilan orang tua, jumlah bersaudara, dan lain-lain sehingga dengan menerapkan analysis KNN dapat dilakukan suatu prediksi berdasarkan kedekatan histori data yang ada dengan data yang baru, apakah mahasiswa tersebut berpeluang untuk menyelesaikan studi tepat waktu atau tidak. Dari hasil pengujian dengan menerapkan algoritma KNN dan menggunakan data sampel alumni tahun wisuda 2004 s.d. 2010 untuk kasus lama dan data alumni tahun wisuda 2011 untuk kasus baru diperoleh tingkat akurasi sebesar 83,36%.This research is intended to predict the possibility of new students time to complete studies using data mining analysis to explore the history stack data using K-Nearest Neighbor algorithm (KNN). Applications generated in this study will use a variety of attributes in a data mining classified among other Ujian Nasional scores (UN), the origin of the school / area, gender, occupation and income of parents, number of siblings, and others that by applying the analysis KNN can do a prediction based on historical proximity of existing data with new data, whether the student is likely to complete the study on time or not. From the test results by applying the KNN algorithm and uses sample data alumnus graduation year 2004 s.d 2010 for the case of a long and alumni data graduation year 2011 for new cases obtained accuracy rate of 83.36%.


2018 ◽  
Author(s):  
I Wayan Agus Surya Darma

Balinese character recognition is a technique to recognize feature or pattern of Balinese character. Feature of Balinese character is generated through feature extraction process. This research using handwritten Balinese character. Feature extraction is a process to obtain the feature of character. In this research, feature extraction process generated semantic and direction feature of handwritten Balinese character. Recognition is using K-Nearest Neighbor algorithm to recognize 81 handwritten Balinese character. The feature of Balinese character images tester are compared with reference features. Result of the recognition system with K=3 and reference=10 is achieved a success rate of 97,53%.


Sign in / Sign up

Export Citation Format

Share Document