New Version of Davies-Bouldin index for clustering validation based on hyper rectangles

Author(s):  
J.C. Rojas Thomas ◽  
M. Mora Cofre ◽  
M. Santos
2020 ◽  
Vol 11 (3) ◽  
pp. 42-67
Author(s):  
Soumeya Zerabi ◽  
Souham Meshoul ◽  
Samia Chikhi Boucherkha

Cluster validation aims to both evaluate the results of clustering algorithms and predict the number of clusters. It is usually achieved using several indexes. Traditional internal clustering validation indexes (CVIs) are mainly based in computing pairwise distances which results in a quadratic complexity of the related algorithms. The existing CVIs cannot handle large data sets properly and need to be revisited to take account of the ever-increasing data set volume. Therefore, design of parallel and distributed solutions to implement these indexes is required. To cope with this issue, the authors propose two parallel and distributed models for internal CVIs namely for Silhouette and Dunn indexes using MapReduce framework under Hadoop. The proposed models termed as MR_Silhouette and MR_Dunn have been tested to solve both the issue of evaluating the clustering results and identifying the optimal number of clusters. The results of experimental study are very promising and show that the proposed parallel and distributed models achieve the expected tasks successfully.


2018 ◽  
Vol 48 (20) ◽  
pp. 5036-5049 ◽  
Author(s):  
Michelangelo Misuraca ◽  
Maria Spano ◽  
Simona Balbi

2020 ◽  
Vol 506 ◽  
pp. 346-365 ◽  
Author(s):  
Jiang Xie ◽  
Zhong-Yang Xiong ◽  
Qi-Zhu Dai ◽  
Xiao-Xia Wang ◽  
Yu-Fang Zhang

Algorithms ◽  
2018 ◽  
Vol 11 (11) ◽  
pp. 177 ◽  
Author(s):  
Xuedong Gao ◽  
Minghan Yang

Clustering is one of the main tasks of machine learning. Internal clustering validation indexes (CVIs) are used to measure the quality of several clustered partitions to determine the local optimal clustering results in an unsupervised manner, and can act as the objective function of clustering algorithms. In this paper, we first studied several well-known internal CVIs for categorical data clustering, and proved the ineffectiveness of evaluating the partitions of different numbers of clusters without any inter-cluster separation measures or assumptions; the accurateness of separation, along with its coordination with the intra-cluster compactness measures, can notably affect performance. Then, aiming to enhance the internal clustering validation measurement, we proposed a new internal CVI—clustering utility based on the averaged information gain of isolating each cluster (CUBAGE)—which measures both the compactness and the separation of the partition. The experimental results supported our findings with regard to the existing internal CVIs, and showed that the proposed CUBAGE outperforms other internal CVIs with or without a pre-known number of clusters.


2020 ◽  
Vol 4 (2) ◽  
Author(s):  
M. Kerem Un ◽  
◽  
Mustafa Guven ◽  
Caglar Cengizler ◽  
Seyda Erdogan ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document