Comparative Advantage Approach for Sparse Text Data Clustering

Author(s):  
Jie Ji ◽  
Tony Y.T. Chan ◽  
Qiangfu Zhao
2020 ◽  
Vol 25 (6) ◽  
pp. 755-769
Author(s):  
Noorullah R. Mohammed ◽  
Moulana Mohammed

Text data clustering is performed for organizing the set of text documents into the desired number of coherent and meaningful sub-clusters. Modeling the text documents in terms of topics derivations is a vital task in text data clustering. Each tweet is considered as a text document, and various topic models perform modeling of tweets. In existing topic models, the clustering tendency of tweets is assessed initially based on Euclidean dissimilarity features. Cosine metric is more suitable for more informative assessment, especially of text clustering. Thus, this paper develops a novel cosine based external and interval validity assessment of cluster tendency for improving the computational efficiency of tweets data clustering. In the experimental, tweets data clustering results are evaluated using cluster validity indices measures. Experimentally proved that cosine based internal and external validity metrics outperforms the other using benchmarked and Twitter-based datasets.


Author(s):  
Krzysztof Ciesielski ◽  
Mieczysław A. Kłopotek

2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
Yujia Sun ◽  
Jan Platoš

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.


Author(s):  
Krzysztof Ciesielski ◽  
Sławomir T. Wierzchoń ◽  
Mieczysław A. Kłopotek

2021 ◽  
Vol 174 (15) ◽  
pp. 13-21
Author(s):  
Sergios Gerakidis ◽  
Sofia Megarchioti ◽  
Basilis Mamalis

Sign in / Sign up

Export Citation Format

Share Document