scholarly journals Application of grid-based k-means clustering algorithm for optimal image processing

2012 ◽  
Vol 9 (4) ◽  
pp. 1679-1696 ◽  
Author(s):  
Tingna Shi ◽  
Penglong Wang ◽  
Jeenshing Wang ◽  
Shihong Yue

The effectiveness of K-means clustering algorithm for image segmentation has been proven in many studies, but is limited in the following problems: 1) the determination of a proper number of clusters. If the number of clusters is determined incorrectly, a good-quality segmented image cannot be guaranteed; 2) the poor typicality of clustering prototypes; and 3) the determination of an optimal number of pixels. The number of pixels plays an important role in any image processing, but so far there is no general and efficient method to determine the optimal number of pixels. In this paper, a grid-based K-means algorithm is proposed for image segmentation. The advantages of the proposed algorithm over the existing K-means algorithm have been validated by some benchmark datasets. In addition, we further analyze the basic characteristics of the algorithm and propose a general index based on maximizing grey differences between investigated objective grays and background grays. Without any additional condition, the proposed index is robust in identifying an optimal number of pixels. Our experiments have validated the effectiveness of the proposed index by the image results that are consistent with the visual perception of the datasets.

Author(s):  
Ahmed Fahim ◽  

The k-means is the most well-known algorithm for data clustering in data mining. Its simplicity and speed of convergence to local minima are the most important advantages of it, in addition to its linear time complexity. The most important open problems in this algorithm are the selection of initial centers and the determination of the exact number of clusters in advance. This paper proposes a solution for these two problems together; by adding a preprocess step to get the expected number of clusters in data and better initial centers. There are many researches to solve each of these problems separately, but there is no research to solve both problems together. The preprocess step requires o(n log n); where n is size of the dataset. This preprocess step aims to get initial portioning of data without determining the number of clusters in advance, then computes the means of initial clusters. After that we apply k-means on original data using the resulting information from the preprocess step to get the final clusters. We use many benchmark datasets to test the proposed method. The experimental results show the efficiency of the proposed method.


2020 ◽  
Vol 2020 ◽  
pp. 1-7
Author(s):  
Abdelilah Et-taleby ◽  
Mohammed Boussetta ◽  
Mohamed Benslimane

Clustering or grouping is among the most important image processing methods that aim to split an image into different groups. Examining the literature, many clustering algorithms have been carried out, where the K-means algorithm is considered among the simplest and most used to classify an image into many regions. In this context, the main objective of this work is to detect and locate precisely the damaged area in photovoltaic (PV) fields based on the clustering of a thermal image through the K-means algorithm. The clustering quality depends on the number of clusters chosen; hence, the elbow, the average silhouette, and NbClust R package methods are used to find the optimal number K. The simulations carried out show that the use of the K-means algorithm allows detecting precisely the faults in PV panels. The excellent result is given with three clusters that is suggested by the elbow method.


2021 ◽  
Author(s):  
Congming Shi ◽  
Bingtao Wei ◽  
Shoulin Wei ◽  
Wen Wang ◽  
Hai Liu ◽  
...  

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.


2021 ◽  
Vol 6 (1) ◽  
pp. 41
Author(s):  
I Kadek Dwi Gandika Supartha ◽  
Adi Panca Saputra Iskandar

In this study, clustering data on STMIK STIKOM Indonesia alumni using the Fuzzy C-Means and Fuzzy Subtractive methods. The method used to test the validity of the cluster is the Modified Partition Coefficient (MPC) and Classification Entropy (CE) index. Clustering is carried out with the aim of finding hidden patterns or information from a fairly large data set, considering that so far the alumni data at STMIK STIKOM Indonesia have not undergone a data mining process. The results of measuring cluster validity using the Modified Partition Coefficient (MPC) and Classification Entropy (CE) index, the Fuzzy C-Means Clustering algorithm has a higher level of validity than the Fuzzy Subtractive Clustering algorithm so it can be said that the Fuzzy C-Means algorithm performs the cluster process better than with the Fuzzy Subtractive method in clustering alumni data. The number of clusters that have the best fitness value / the most optimal number of clusters based on the CE and MPC validity index is 5 clusters. The cluster that has the best characteristics is the 1st cluster which has 514 members (36.82% of the total alumni). With the characteristics of having an average GPA of 3.3617, the average study period is 7.8102 semesters and an average TA work period of 4.9596 months.


Sign in / Sign up

Export Citation Format

Share Document