scholarly journals Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016

2019 ◽  
Vol 15 (2) ◽  
pp. 226
Author(s):  
Wishnu Hardi ◽  
Wisnu Ananta Kusuma ◽  
Sulistyo Basuki

Introduction. The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents.Methodology. K-Means was used algorithm as a non-hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters. Data Analysis.  Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result.Results and discussions. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people links”, “economic cooperation”, and “human development” were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016.Conclusions. Text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.  

2009 ◽  
pp. 1164-1181
Author(s):  
Richard S. Segall ◽  
Qingyu Zhang

This chapter presents background on text mining, and comparisons and summaries of seven selected software for text mining. The text mining software selected for discussion and comparison in this chapter are: Compare Suite by AKS-Labs, SAS Text Miner, Megaputer Text Analyst, Visual Text by Text Analysis International, Inc. (TextAI), Magaputer PolyAnalyst, WordStat by Provalis Research, and SPSS Clementine. This chapter not only discusses unique features of these text mining software packages but also compares the features offered by each in the following key steps in analyzing unstructured qualitative data: data preparation, data analysis, and result reporting. A brief discussion of Web mining and its software are also presented, as well as conclusions and future trends.


Author(s):  
Richard S. Segall

This chapter presents background on text mining, and comparisons and summaries of seven selected software for text mining. The text mining software selected for discussion and comparison in this chapter are: Compare Suite by AKS-Labs, SAS Text Miner, Megaputer Text Analyst, Visual Text by Text Analysis International, Inc. (TextAI), Magaputer PolyAnalyst, WordStat by Provalis Research, and SPSS Clementine. This chapter not only discusses unique features of these text mining software packages but also compares the features offered by each in the following key steps in analyzing unstructured qualitative data: data preparation, data analysis, and result reporting. A brief discussion of Web mining and its software are also presented, as well as conclusions and future trends.


Author(s):  
Ana Belén Ramos-Guajardo

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.


2011 ◽  
Vol 7 (4) ◽  
pp. 43-63 ◽  
Author(s):  
Shuliang Wang ◽  
Wenyan Gan ◽  
Deyi Li ◽  
Deren Li

In this paper, data field is proposed to group data objects via simulating their mutual interactions and opposite movements for hierarchical clustering. Enlightened by the field in physical space, data field to simulate nuclear field is presented to illuminate the interaction between objects in data space. In the data field, the self-organized process of equipotential lines on many data objects discovers their hierarchical clustering-characteristics. During the clustering process, a random sample is first generated to optimize the impact factor. The masses of data objects are then estimated to select core data object with nonzero masses. Taking the core data objects as the initial clusters, the clusters are iteratively merged hierarchy by hierarchy with good performance. The results of a case study show that the data field is capable of hierarchical clustering on objects varying size, shape or granularity without user-specified parameters, as well as considering the object features inside the clusters and removing the outliers from noisy data. The comparisons illustrate that the data field clustering performs better than K-means, BIRCH, CURE, and CHAMELEON.


Author(s):  
Safitri Hariani ◽  
Saiful Anwar Matondang

Sosiolinguistik memberikan pengetahuan tentang code switching (campur kode). Analisis teks novel untuk memahami penggunaan campur kode penutur dapat dilaksanakan untuk pengembangan ilmu sosiologi bahasa. Metode deskriptif kualitatif dalam menganalisis temuan dan mengklasifikasikan pencampuran kode dari Novel Andre Herata Sang Pemimpi. Analisis data berfokus pada kalimat dan paragraf yang menunjukkan penggunaan pencampuran kode di dalamnya. Hasil penelitian menunjukkan bahwa ada satu jenis pencampuran kode yang ditemukan dalam novel Sang Pemimpi yaitu pencampuran kode luar (bahasa Indonesia dan Inggris) yang kemunculannya ada dalam bentuk kata, frasa, dan penyisipan klausa. Penggunaan dan jenis-jenis pencampuran kode yang ada dalam novel Sang Pemimpi karya Andrea Hirata meyangkut pencampuran kode; pada pencampuran berbagai unit linguistik, seperti morfem, kata-kata, pengubah, frasa, klausa dan kalimat, terutama yang berasal dari dua sistem tata bahasa yang berpartisipasi dalam sebuah kalimat. Dapat disimpulkan narator menggunakan pencampuran kode dari Bahasa Indonesia, Bahasa Melayu, dan pencampuran kode luar dari Bahasa Inggris; punchbag, sprinter, fan, session, speaker, shock, slide dan Bahasa Arab. Abstract. Sociolinguistics gave knowledge of the switching codes. Text analysis of novels for understanding the interdiction of interpreting codes can be performed for the development of language sociology. The qualitative descriptive method of analyzing the find and classifying the code mixing of the novel Andre herata Sang pemimpi. Data analysis focuses on sentences and paragraphs that indicate the use of a code mixing in them. Studies have shown that one type of mixing code found in the Sang pemimpi novel is that it includes a mixture of outside codes (Indonesian and English) that appears in the form of words, phrases, and insertion of klausa. The use and kinds of coded blending in the book of Sang pemimpi Andrea hirata passes the mixing of codes; In the mixing of various linguistic units, such as morpheme, words, modifiers, phrases, clauses and sentences, especially those that come from the two grammatical systems that participate in a sentence. It could be inferred that the narrator used a code mixing from Indonesian, Malay, and a proprietary blend of English; punchbag, sprinter, fan, session, speaker, shock, slide dan Arabic.


2017 ◽  
Vol 44 (2) ◽  
pp. 97
Author(s):  
Subandi

One of the tasks of human development is to discover its identity. There are many ways a person can find that identity. One of them is through the identification of the idol figure. This study aims to understand the process of identity formation through the identification of shadow puppet figures in the context of Javanese culture. A qualitative phenomenological approach was used to understand the process retrospectively. Three main participants were involved in the study. The process of collecting data was done through in-depth interview methods to both the main participants and their significant others. FGD (focused group discussion) is also conducted to explore participants' understanding of their process of identity formation. Data analysis was done by phenomenological method. This study found three main themes, namely interested in shadow puppet figures, matching self-image with puppet characters, and the use of feeling as a bridge between self and the puppet. This study concluded that shadow puppets can be used as a means to find identity among their fans.


Sign in / Sign up

Export Citation Format

Share Document