Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016

Wishnu Hardi; Wisnu Ananta Kusuma; Sulistyo Basuki

doi:10.22146/bip.36451

Clustering topic groups of documents using K-Means algorithm: Australian Embassy Jakarta media releases 2006-2016

Berkala Ilmu Perpustakaan dan Informasi ◽

10.22146/bip.36451 ◽

2019 ◽

Vol 15 (2) ◽

pp. 226

Author(s):

Wishnu Hardi ◽

Wisnu Ananta Kusuma ◽

Sulistyo Basuki

Keyword(s):

Data Analysis ◽

Text Mining ◽

Human Development ◽

Hierarchical Clustering ◽

Text Analysis ◽

Economic Cooperation ◽

Clustering Method ◽

Media Release ◽

Data Objects ◽

Data Variation

Introduction. The Australian Embassy in Jakarta is storing a wide array of media release document. Analyzing particular and vital patterns of the documents collection is imperative as it will result in new insights and knowledge of significant topic groups of the documents.Methodology. K-Means was used algorithm as a non-hierarchical clustering method which partitioning data objects into clusters. The method works through minimizing data variation within cluster and maximizing data variation between clusters. Data Analysis. Of the documents issued between 2006 and 2016, 839 documents were examined in order to determine term frequencies and to generate clusters. Evaluation was conducted by nominating an expert to validate the cluster result.Results and discussions. The result showed that there were 57 meaningful terms grouped into 3 clusters. “People to people links”, “economic cooperation”, and “human development” were chosen to represent topics of the Australian Embassy Jakarta media releases from 2006 to 2016.Conclusions. Text mining can be used to cluster topic groups of documents. It provides a more systematic clustering process as the text analysis is conducted through a number of stages with specifically set parameters.

Download Full-text

A Survey of Selected Software Technologies for Text Mining

Software Applications ◽

10.4018/978-1-60566-060-8.ch068 ◽

2009 ◽

pp. 1164-1181

Author(s):

Richard S. Segall ◽

Qingyu Zhang

Keyword(s):

Data Analysis ◽

Text Mining ◽

Text Analysis ◽

Web Mining ◽

Qualitative Data ◽

Future Trends ◽

Data Preparation ◽

Software Packages ◽

Visual Text ◽

Key Steps

This chapter presents background on text mining, and comparisons and summaries of seven selected software for text mining. The text mining software selected for discussion and comparison in this chapter are: Compare Suite by AKS-Labs, SAS Text Miner, Megaputer Text Analyst, Visual Text by Text Analysis International, Inc. (TextAI), Magaputer PolyAnalyst, WordStat by Provalis Research, and SPSS Clementine. This chapter not only discusses unique features of these text mining software packages but also compares the features offered by each in the following key steps in analyzing unstructured qualitative data: data preparation, data analysis, and result reporting. A brief discussion of Web mining and its software are also presented, as well as conclusions and future trends.

Download Full-text

A Survey of Selected Software Technologies for Text Mining

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch044 ◽

2010 ◽

pp. 766-784 ◽

Cited By ~ 2

Author(s):

Richard S. Segall

Keyword(s):

Data Analysis ◽

Text Mining ◽

Text Analysis ◽

Web Mining ◽

Qualitative Data ◽

Future Trends ◽

Data Preparation ◽

Software Packages ◽

Visual Text ◽

Key Steps

Download Full-text

Molecular Modelling in MRI Contrast Agents Interacting with Water Molecules: Hierarchical Clustering Method for Molecular Dynamics Data Analysis

Biophysical Journal ◽

10.1016/j.bpj.2014.11.1717 ◽

2015 ◽

Vol 108 (2) ◽

pp. 316a

Author(s):

Luca Guzzardi ◽

Dennis Cazar ◽

Vanessa del Hierro ◽

Fernando J. Torres ◽

Miguel A. Mendez

Keyword(s):

Molecular Dynamics ◽

Data Analysis ◽

Contrast Agents ◽

Hierarchical Clustering ◽

Molecular Modelling ◽

Mri Contrast Agents ◽

Water Molecules ◽

Clustering Method ◽

Mri Contrast

Download Full-text

A hierarchical clustering method for random intervals based on a similarity measure

Computational Statistics ◽

10.1007/s00180-021-01121-3 ◽

2021 ◽

Author(s):

Ana Belén Ramos-Guajardo

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Real Life ◽

Stopping Criterion ◽

Clustering Method ◽

Bootstrap Test ◽

Empirical Performance ◽

Random Intervals ◽

Expected Values

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.

Download Full-text

A hierarchical clustering method for the description of structures in galaxy clusters

10.1063/1.48337 ◽

1995 ◽

Author(s):

Daniel Gerbal ◽

Arturo Serna

Keyword(s):

Galaxy Clusters ◽

Hierarchical Clustering ◽

Clustering Method

Download Full-text

DIVFRP: An automatic divisive hierarchical clustering method based on the furthest reference points

Pattern Recognition Letters ◽

10.1016/j.patrec.2008.07.002 ◽

2008 ◽

Vol 29 (16) ◽

pp. 2067-2077 ◽

Cited By ~ 7

Author(s):

Caiming Zhong ◽

Duoqian Miao ◽

Ruizhi Wang ◽

Xinmin Zhou

Keyword(s):

Hierarchical Clustering ◽

Reference Points ◽

Clustering Method ◽

Divisive Hierarchical Clustering

Download Full-text

Data Field for Hierarchical Clustering

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2011100103 ◽

2011 ◽

Vol 7 (4) ◽

pp. 43-63 ◽

Cited By ~ 55

Author(s):

Shuliang Wang ◽

Wenyan Gan ◽

Deyi Li ◽

Deren Li

Keyword(s):

Hierarchical Clustering ◽

Physical Space ◽

Core Data ◽

Data Object ◽

Group Data ◽

Data Field ◽

Data Objects ◽

The Masses ◽

Space Data ◽

The Impact

In this paper, data field is proposed to group data objects via simulating their mutual interactions and opposite movements for hierarchical clustering. Enlightened by the field in physical space, data field to simulate nuclear field is presented to illuminate the interaction between objects in data space. In the data field, the self-organized process of equipotential lines on many data objects discovers their hierarchical clustering-characteristics. During the clustering process, a random sample is first generated to optimize the impact factor. The masses of data objects are then estimated to select core data object with nonzero masses. Taking the core data objects as the initial clusters, the clusters are iteratively merged hierarchy by hierarchy with good performance. The results of a case study show that the data field is capable of hierarchical clustering on objects varying size, shape or granularity without user-specified parameters, as well as considering the object features inside the clusters and removing the outliers from noisy data. The comparisons illustrate that the data field clustering performs better than K-means, BIRCH, CURE, and CHAMELEON.

Download Full-text

CAMPUR KODE DI DALAM NOVEL SANG PEMIMPI

PARAFRASE Jurnal Kajian Kebahasaan & Kesastraan ◽

10.30996/parafrase.v21i1.3501 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Safitri Hariani ◽

Saiful Anwar Matondang

Keyword(s):

Data Analysis ◽

Text Analysis ◽

The Novel ◽

Code Mixing ◽

Qualitative Descriptive ◽

Development Of Language ◽

Language Sociology ◽

Linguistic Units ◽

Descriptive Method ◽

Bahasa Indonesia

Sosiolinguistik memberikan pengetahuan tentang code switching (campur kode). Analisis teks novel untuk memahami penggunaan campur kode penutur dapat dilaksanakan untuk pengembangan ilmu sosiologi bahasa. Metode deskriptif kualitatif dalam menganalisis temuan dan mengklasifikasikan pencampuran kode dari Novel Andre Herata Sang Pemimpi. Analisis data berfokus pada kalimat dan paragraf yang menunjukkan penggunaan pencampuran kode di dalamnya. Hasil penelitian menunjukkan bahwa ada satu jenis pencampuran kode yang ditemukan dalam novel Sang Pemimpi yaitu pencampuran kode luar (bahasa Indonesia dan Inggris) yang kemunculannya ada dalam bentuk kata, frasa, dan penyisipan klausa. Penggunaan dan jenis-jenis pencampuran kode yang ada dalam novel Sang Pemimpi karya Andrea Hirata meyangkut pencampuran kode; pada pencampuran berbagai unit linguistik, seperti morfem, kata-kata, pengubah, frasa, klausa dan kalimat, terutama yang berasal dari dua sistem tata bahasa yang berpartisipasi dalam sebuah kalimat. Dapat disimpulkan narator menggunakan pencampuran kode dari Bahasa Indonesia, Bahasa Melayu, dan pencampuran kode luar dari Bahasa Inggris; punchbag, sprinter, fan, session, speaker, shock, slide dan Bahasa Arab. Abstract. Sociolinguistics gave knowledge of the switching codes. Text analysis of novels for understanding the interdiction of interpreting codes can be performed for the development of language sociology. The qualitative descriptive method of analyzing the find and classifying the code mixing of the novel Andre herata Sang pemimpi. Data analysis focuses on sentences and paragraphs that indicate the use of a code mixing in them. Studies have shown that one type of mixing code found in the Sang pemimpi novel is that it includes a mixture of outside codes (Indonesian and English) that appears in the form of words, phrases, and insertion of klausa. The use and kinds of coded blending in the book of Sang pemimpi Andrea hirata passes the mixing of codes; In the mixing of various linguistic units, such as morpheme, words, modifiers, phrases, clauses and sentences, especially those that come from the two grammatical systems that participate in a sentence. It could be inferred that the narrator used a code mixing from Indonesian, Malay, and a proprietary blend of English; punchbag, sprinter, fan, session, speaker, shock, slide dan Arabic.

Download Full-text

Multi-stage Hierarchical Clustering Method Based on Hypergraph

Intelligent Computing Methodologies - Lecture Notes in Computer Science ◽

10.1007/978-3-030-60796-8_37 ◽

2020 ◽

pp. 432-443

Author(s):

Yue Xi ◽

Yonggang Lu

Keyword(s):

Hierarchical Clustering ◽

Clustering Method ◽

Multi Stage

Download Full-text

Peran Identifikasi Tokoh Wayang dalam Pembentukan Identitas Diri

Jurnal Psikologi ◽

10.22146/jpsi.22793 ◽

2017 ◽

Vol 44 (2) ◽

pp. 97

Author(s):

Subandi

Keyword(s):

Data Analysis ◽

Identity Formation ◽

Human Development ◽

Group Discussion ◽

Phenomenological Approach ◽

Phenomenological Method ◽

Self Image ◽

Depth Interview ◽

Interview Methods ◽

Qualitative Phenomenological

One of the tasks of human development is to discover its identity. There are many ways a person can find that identity. One of them is through the identification of the idol figure. This study aims to understand the process of identity formation through the identification of shadow puppet figures in the context of Javanese culture. A qualitative phenomenological approach was used to understand the process retrospectively. Three main participants were involved in the study. The process of collecting data was done through in-depth interview methods to both the main participants and their significant others. FGD (focused group discussion) is also conducted to explore participants' understanding of their process of identity formation. Data analysis was done by phenomenological method. This study found three main themes, namely interested in shadow puppet figures, matching self-image with puppet characters, and the use of feeling as a bridge between self and the puppet. This study concluded that shadow puppets can be used as a means to find identity among their fans.

Download Full-text