Nonuniform Sparse Data Clustering Cascade Algorithm Based on Dynamic Cumulative Entropy

Mathematical Problems in Engineering ◽

10.1155/2016/5707692 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10

Author(s):

Ning Li ◽

Yunxia Gu ◽

Zhongliang Deng

Keyword(s):

Initial Data ◽

Prior Knowledge ◽

Clustering Algorithm ◽

Sparse Data ◽

Cluster Center ◽

Control Factor ◽

Cascade Algorithm ◽

Initial Cluster ◽

Data Clusters ◽

Cumulative Entropy

A small amount of prior knowledge and randomly chosen initial cluster centers have a direct impact on the accuracy of the performance of iterative clustering algorithm. In this paper we propose a new algorithm to compute initial cluster centers for k-means clustering and the best number of the clusters with little prior knowledge and optimize clustering result. It constructs the Euclidean distance control factor based on aggregation density sparse degree to select the initial cluster center of nonuniform sparse data and obtains initial data clusters by multidimensional diffusion density distribution. Multiobjective clustering approach based on dynamic cumulative entropy is adopted to optimize the initial data clusters and the best number of the clusters. The experimental results show that the newly proposed algorithm has good performance to obtain the initial cluster centers for the k-means algorithm and it effectively improves the clustering accuracy of nonuniform sparse data by about 5%.

Download Full-text

Weighted k-Prototypes Clustering Algorithm Based on the Hybrid Dissimilarity Coefficient

Mathematical Problems in Engineering ◽

10.1155/2020/5143797 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Ziqi Jia ◽

Ling Song

Keyword(s):

Categorical Data ◽

Clustering Algorithm ◽

Numerical Data ◽

Experimental Results ◽

Cluster Center ◽

Real Dataset ◽

Dissimilarity Coefficient ◽

Initial Cluster ◽

Data Objects ◽

Selection Of

The k-prototypes algorithm is a hybrid clustering algorithm that can process Categorical Data and Numerical Data. In this study, the method of initial Cluster Center selection was improved and a new Hybrid Dissimilarity Coefficient was proposed. Based on the proposed Hybrid Dissimilarity Coefficient, a weighted k-prototype clustering algorithm based on the hybrid dissimilarity coefficient was proposed (WKPCA). The proposed WKPCA algorithm not only improves the selection of initial Cluster Centers, but also puts a new method to calculate the dissimilarity between data objects and Cluster Centers. The real dataset of UCI was used to test the WKPCA algorithm. Experimental results show that WKPCA algorithm is more efficient and robust than other k-prototypes algorithms.

Download Full-text

Study of Combined Fuzzy Clustering Algorithm Based on F-Statistics Hierarchy Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.29-32.802 ◽

2010 ◽

Vol 29-32 ◽

pp. 802-808

Author(s):

Min Min

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Evaluation Data ◽

Fuzzy Clustering Algorithm ◽

Initial Cluster ◽

The Common ◽

Common Problems ◽

F Statistics

On analyzing the common problems in fuzzy clustering algorithms, we put forward the combined fuzzy clustering one, which will automatically generate a reasonable clustering numbers and initial cluster center. This clustering algorithm has been tested by real evaluation data of teaching designs. The result proves that the combined fuzzy clustering based on F-statistic is more effective.

Download Full-text

Improved K-means Clustering Algorithm and its Applications

Recent Patents on Engineering ◽

10.2174/1872212113666181203110611 ◽

2019 ◽

Vol 13 (4) ◽

pp. 403-409

Author(s):

Hui Qi ◽

Jinqing Li ◽

Xiaoqiang Di ◽

Weiwu Ren ◽

Fengrong Zhang

Keyword(s):

Data Clustering ◽

Large Scale ◽

Clustering Algorithm ◽

Principal Component ◽

Mean Value ◽

Attack Detection ◽

Cluster Center ◽

Network Attack ◽

Initial Cluster ◽

Low Dimensional

Background: K-means algorithm is implemented through two steps: initialization and subsequent iterations. Initialization is to select the initial cluster center, while subsequent iterations are to continuously change the cluster center until it won't change any more or the number of iterations reaches its maximum. K-means algorithm is so sensitive to the cluster center selected during initialization that the selection of a different initial cluster center will influence the algorithm performance. Therefore, improving the initialization process has become an important means of K-means performance improvement. Methods: This paper uses a new strategy to select the initial cluster center. It first calculates the minimum and maximum values of the data in a certain index (For lower-dimensional data, such as twodimensional data, features with larger variance, or the distance to the origin can be selected; for higher-dimensional data, PCA can be used to select the principal component with the largest variance), and then divides the range into equally-sized sub-ranges. Next adjust the sub-ranges based on the data distribution so that each sub-range contains as much data as possible. Finally, the mean value of the data in each sub-range is calculated and used as the initial clustering center. Results: The theoretical analysis shows that although the time complexity of the initialization process is linear, the algorithm has the characteristics of the superlinear initialization method. This algorithm is applied to two-dimensional GPS data analysis and high-dimensional network attack detection. Experimental results show that this algorithm achieves high clustering performance and clustering speed. Conclusion: This paper reduces the subsequent iterations of K-means algorithm without compromising the clustering performance, which makes it suitable for large-scale data clustering. This algorithm can not only be applied to low-dimensional data clustering, but also suitable for highdimensional data.

Download Full-text

Fuzzy C-Means Clustering Algorithm Based on Coefficient of Variation

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.998-999.873 ◽

2014 ◽

Vol 998-999 ◽

pp. 873-877

Author(s):

Zhen Bo Wang ◽

Bao Zhi Qiu

Keyword(s):

Coefficient Of Variation ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Data ◽

Cluster Center ◽

Data Set ◽

Fuzzy C Means ◽

Initial Cluster ◽

Fuzzy C Means Clustering ◽

The Impact

To reduce the impact of irrelevant attributes on clustering results, and improve the importance of relevant attributes to clustering, this paper proposes fuzzy C-means clustering algorithm based on coefficient of variation (CV-FCM). In the algorithm, coefficient of variation is used to weigh attributes so as to assign different weights to each attribute in the data set, and the magnitude of weight is used to express the importance of different attributes to clusters. In addition, for the characteristic of fuzzy C-means clustering algorithm that it is susceptible to initial cluster center value, the method for the selection of initial cluster center based on maximum distance is introduced on the basis of weighted coefficient of variation. The result of the experiment based on real data sets shows that this algorithm can select cluster center effectively, with the clustering result superior to general fuzzy C-means clustering algorithms.

Download Full-text

IKM-NCS: A Novel Clustering Scheme Based on Improved K-Means Algorithm

International Journal of Mathematical Models and Methods in Applied Sciences ◽

10.46300/9101.2020.14.20 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Clustering Algorithm ◽

User Behavior ◽

Abnormal Behavior ◽

Cluster Center ◽

Initial Sample ◽

Initial Cluster ◽

Weight Calculation ◽

Data Points ◽

Abnormal Behaviors ◽

Better Than

Aiming at the problems of distorted center selection and slow iteration convergence in traditional clustering analysis algorithm, a novel clustering scheme based on improved k-means algorithm is proposed. In this paper, based on the analysis of all user behavior sets contained in the initial sample, a weight calculation method for abnormal behaviors and an eigenvalue extraction method for abnormal behavior set are proposed and a set of abnormal behaviors is constructed for each user according to the behavior data generated by abnormal users. Then, on the basis of the traditional k-means clustering algorithm, an improved algorithm is proposed. By calculating the compactness of all data points and selecting the initial cluster center among the data points with high and low compactness, the clustering performance is enhanced. Finally, the eigenvalues of the abnormal behavior set are used as the input of the algorithm to output the clustering results of the abnormal behavior. Experimental results show that the clustering performance of this algorithm is better than the traditional clustering algorithm, and can effectively improve the clustering performance of abnormal behavior

Download Full-text

Research on Density-Based K-means Clustering Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/2137/1/012071 ◽

2021 ◽

Vol 2137 (1) ◽

pp. 012071

Author(s):

Shuxin Liu ◽

Xiangdong Liu

Keyword(s):

Clustering Algorithm ◽

Cluster Center ◽

Basic Principles ◽

K Value ◽

Advantages And Disadvantages ◽

Dbscan Algorithm ◽

Initial Cluster ◽

Development Direction ◽

Density Clustering ◽

Improved Algorithm

Abstract Cluster analysis is an unsupervised learning process, and its most classic algorithm K-means has the advantages of simple principle and easy implementation. In view of the K-means algorithm’s shortcoming, where is arbitrary processing of clusters k value, initial cluster center and outlier points. This paper discusses the improvement of traditional K-means algorithm and puts forward an improved algorithm with density clustering algorithm. First, it describes the basic principles and process of the K-means algorithm and the DBSCAN algorithm. Then summarizes improvement methods with the three aspects and their advantages and disadvantages, at the same time proposes a new density-based K-means improved algorithm. Finally, it prospects the development direction and trend of the density-based K-means clustering algorithm.

Download Full-text

An Edge Exposure using Caliber Fuzzy C-means With Canny Algorithm

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v8.i1.pp59-68 ◽

2017 ◽

Vol 8 (1) ◽

pp. 59

Author(s):

Gowri Jeyaraman ◽

Janakiraman Subbiah

Keyword(s):

Edge Detection ◽

Clustering Algorithm ◽

Detection Algorithm ◽

Cluster Center ◽

Detection Techniques ◽

Fuzzy C Means ◽

Classical Study ◽

Detection Algorithms ◽

Initial Cluster ◽

Canny Algorithm

<p>Edge exposure or edge detection is an important and classical study of the medical field and computer vision. Caliber Fuzzy C-means (CFCM) clustering Algorithm for edge detection depends on the selection of initial cluster center value. This endeavor to put in order a collection of pixels into a cluster, such that a pixel within the cluster must be more comparable to every other pixel. Using CFCM techniques first cluster the BSDS image, next the clustered image is given as an input to the basic canny edge detection algorithm. The application of new parameters with fewer operations for CFCM is fruitful. According to the calculation, a result acquired by using CFCM clustering function divides the image into four clusters in common. The proposed method is evidently robust into the modification of fuzzy c-means and canny algorithm. The convergence of this algorithm is very speedy compare to the entire edge detection algorithms. The consequences of this proposed algorithm make enhanced edge detection and better result than any other traditional image edge detection techniques.</p>

Download Full-text

Incremental kernel fuzzy c-means with optimizing cluster center initialization and delivery

Kybernetes ◽

10.1108/k-08-2015-0209 ◽

2016 ◽

Vol 45 (8) ◽

pp. 1273-1291 ◽

Cited By ~ 1

Author(s):

Runhai Jiao ◽

Shaolong Liu ◽

Wu Wen ◽

Biying Lin

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Cluster Center ◽

Accurate Information ◽

Incremental Clustering ◽

Data Set ◽

Content Type ◽

Fuzzy C Means ◽

Initial Cluster

Purpose The large volume of big data makes it impractical for traditional clustering algorithms which are usually designed for entire data set. The purpose of this paper is to focus on incremental clustering which divides data into series of data chunks and only a small amount of data need to be clustered at each time. Few researches on incremental clustering algorithm address the problem of optimizing cluster center initialization for each data chunk and selecting multiple passing points for each cluster. Design/methodology/approach Through optimizing initial cluster centers, quality of clustering results is improved for each data chunk and then quality of final clustering results is enhanced. Moreover, through selecting multiple passing points, more accurate information is passed down to improve the final clustering results. The method has been proposed to solve those two problems and is applied in the proposed algorithm based on streaming kernel fuzzy c-means (stKFCM) algorithm. Findings Experimental results show that the proposed algorithm demonstrates more accuracy and better performance than streaming kernel stKFCM algorithm. Originality/value This paper addresses the problem of improving the performance of increment clustering through optimizing cluster center initialization and selecting multiple passing points. The paper analyzed the performance of the proposed scheme and proved its effectiveness.

Download Full-text

Research on Improved K-Means Clustering Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.403-408.1977 ◽

2011 ◽

Vol 403-408 ◽

pp. 1977-1980

Author(s):

Yin Sheng Zhang ◽

Hui Lin Shan ◽

Jia Qiang Li ◽

Jie Zhou

Keyword(s):

Clustering Algorithm ◽

Cluster Center ◽

Local Optimum ◽

Java Language ◽

Initial Cluster ◽

Feature Extraction And Selection ◽

Clustering Quality ◽

Fuzzy Neural ◽

Hierarchical Clustering Algorithm ◽

Selection Of

The traditional K-means clustering algorithm prematurely plunges into a local optimum because of sensitive selection of the initial cluster center. Hierarchical clustering algorithm can be used to generate the initial cluster center of K-means clustering algorithm. The geometric features of input data can achieve a good distribution by means of pretreatment and feature extraction and selection. In the learning of fuzzy neural network, Java language is used to write source code of the algorithm. The experimental results show that new algorithm has improved the clustering quality effectively.

Download Full-text

The Discretization of Continuous Attributes Based on Improved SOM Clustering

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.88 ◽

2014 ◽

Vol 701-702 ◽

pp. 88-93 ◽

Cited By ~ 1

Author(s):

Gang Tao ◽

Yong Gang Yan ◽

Jiao Zou ◽

Jun Liu

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Nearest Neighbors ◽

Experimental Results ◽

Cluster Center ◽

Initial Cluster ◽

Hierarchical Clustering Algorithm ◽

Som Clustering ◽

Continuous Attribute

In order to solve the problem of continuous attribute discretization, a new improved SOM clustering algorithm was proposed. The algorithm uses the SOM to achieve the initial cluster and get the clustering up limit, then treats the initial cluster centers as samples and use the BIRCH hierarchical clustering algorithm to get secondary clustering, then solves the problems of inflated clusters and identifies discrete breakpoints set. Finally, find the nearest neighbors of each cluster center among these any samples of Breakpoints sets which belong to its attribute, and use it as a basis of discrete trimming. The experimental results show that the proposed algorithm outperforms the conventional discrete SOM clustering algorithm in the breakpoints set (contour factor to enhance 75%) and discrete accuracy (incompatible degrees closer to 0) aspects.

Download Full-text