EM clustering algorithm modification using multivariate hierarchical histogram in the case of undefined cluster number

Author(s):  
Anna Denisova ◽  
Valdislav Sergeyev
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Xin Liu ◽  
Hong-Kun Chen ◽  
Bing-Qing Huang ◽  
Yu-Bo Tao

Integrating wind generation, photovoltaic power, and battery storage to form hybrid power systems has been recognized to be promising in renewable energy development. However, considering the system complexity and uncertainty of renewable energies, such as wind and solar types, it is difficult to obtain practical solutions for these systems. In this paper, optimal sizing for a wind/PV/battery system is realized by trade-offs between technical and economic factors. Firstly, the fuzzy c-means clustering algorithm was modified with self-adapted parameters to extract useful information from historical data. Furthermore, the Markov model is combined to determine the chronological system states of natural resources and load. Finally, a power balance strategy is introduced to guide the optimization process with the genetic algorithm to establish the optimal configuration with minimized cost while guaranteeing reliability and environmental factors. A case of island hybrid power system is analyzed, and the simulation results are compared with the general FCM method and chronological method to validate the effectiveness of the mentioned method.


2013 ◽  
Vol 760-762 ◽  
pp. 2220-2223
Author(s):  
Lang Guo

In view of the defects of K-means algorithm in intrusion detection: the need of preassign cluster number and sensitive initial center and easy to fall into local optimum, this paper puts forward a fuzzy clustering algorithm. The fuzzy rules are utilized to express the invasion features, and standardized matrix is adopted to further process so as to reflect the approximation degree or correlation degree between the invasion indicator data and establish a similarity matrix. The simulation results of KDD CUP1999 data set show that the algorithm has better intrusion detection effect and can effectively detect the network intrusion data.


Author(s):  
Yubin Guo ◽  
Yuhang Wu ◽  
Xiaopeng Zhang ◽  
Aofeng Bo ◽  
Ximing Li

2019 ◽  
Vol 19 (04) ◽  
pp. 1950019 ◽  
Author(s):  
Maissa Hamouda ◽  
Karim Saheb Ettabaa ◽  
Med Salim Bouhlel

Convolutional neural networks (CNN) can learn deep feature representation for hyperspectral imagery (HSI) interpretation and attain excellent accuracy of classification if we have many training samples. Due to its superiority in feature representation, several works focus on it, among which a reliable classification approach based on CNN, used filters generated from cluster framework, like k Means algorithm, yielded good results. However, the kernels number to be manually assigned. To solve this problem, a HSI classification framework based on CNN, where the convolutional filters to be adaptatively learned from the data, by grouping without knowing the cluster number, has recently proposed. This framework, based on the two algorithms CNN and kMeans, showed high accuracy results. So, in the same context, we propose an architecture based on the depth convolution al neural networks principle, where kernels are adaptatively learned, using CkMeans network, to generate filters without knowing the number of clusters, for hyperspectral classification. With adaptive kernels, the proposed framework automatic kernels selection by CkMeans algorithm (AKSCCk) achieves a better classification accuracy compared to the previous frameworks. The experimental results show the effectiveness and feasibility of AKSCCk approach.


2005 ◽  
Vol 17 (12) ◽  
pp. 2672-2698 ◽  
Author(s):  
Qing Song

We focus on the scenario of robust information clustering (RIC) based on the minimax optimization of mutual information (MI). The minimization of MI leads to the standard mass-constrained deterministic annealing clustering, which is an empirical risk-minimization algorithm. The maximization of MI works out an upper bound of the empirical risk via the identification of outliers (noisy data points). Furthermore, we estimate the real risk VC-bound and determine an optimal cluster number of the RIC based on the structural risk-minimization principle. One of the main advantages of the minimax optimization of MI is that it is a nonparametric approach, which identifies the outliers through the robust density estimate and forms a simple data clustering algorithm based on the square error of the Euclidean distance.


2021 ◽  
Author(s):  
Congming Shi ◽  
Bingtao Wei ◽  
Shoulin Wei ◽  
Wen Wang ◽  
Hai Liu ◽  
...  

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.


2018 ◽  
Vol 2 (1) ◽  
pp. 36-44
Author(s):  
Sitti Sufiah Atirah Rosly ◽  
Balkiah Moktar ◽  
Muhamad Hasbullah Mohd Razali

Air quality is one of the most popular environmental problems in this globalization era. Air pollution is the poisonous air that comes from car emissions, smog, open burning, chemicals from factories and other particles and gases. This harmful air can give adverse effects to human health and the environment. In order to provide information which areas are better for the residents in Malaysia, cluster analysis is used to determine the areas that can be clustering together based on their a ir quality through several air quality substances. Monthly data from 37 monitoring stations in Peninsular Malaysia from the year 2013 to 2015 were used in this study. K - Means (KM) clustering algorithm, Expectation Maximization (EM) clustering algorithm and Density Based (DB) clustering algorithm have been chosen as the techniques to analyze the cluster analysis by utilizing the Waikato Environment for Knowledge Analysis (WEKA) tools. Results show that K - means clustering algorithm is the best method among ot her algorithms due to its simplicity and time taken to build the model. The output of K - means clustering algorithm shows that it can cluster the area into two clusters, namely as cluster 0 and cluster 1. Clusters 0 consist of 16 monitoring stations and clu ster 1 consists of 36 monitoring stations in Peninsular Malaysia.


2017 ◽  
Author(s):  
João C. Marques ◽  
Michael B. Orger

AbstractHow to partition a data set into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data varies widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental data sets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.Author summarIt is common that natural phenomena produce groupings, or clusters, in data, that can reveal the underlying processes. However, the form of these clusters can vary arbitrarily, making it challenging to find a single algorithm that identifies their structure correctly, without prior knowledge of the number of groupings or their distribution. We describe a simple clustering algorithm that is fully automatic and is able to correctly identify the number and shape of groupings in data of many types. We expect this algorithm to be useful in finding unknown natural phenomena present in data from a wide range of scientific fields.


2021 ◽  
Author(s):  
BAICHENG LV ◽  
WENHUA WU ◽  
ZHIQIANG HU

Abstract With the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


Sign in / Sign up

Export Citation Format

Share Document