EM clustering algorithm modification using multivariate hierarchical histogram in the case of undefined cluster number

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text

Optimal Sizing for Wind/PV/Battery System Using Fuzzy c-Means Clustering with Self-Adapted Cluster Number

International Journal of Rotating Machinery ◽

10.1155/2017/5142825 ◽

2017 ◽

Vol 2017 ◽

pp. 1-9 ◽

Cited By ~ 4

Author(s):

Xin Liu ◽

Hong-Kun Chen ◽

Bing-Qing Huang ◽

Yu-Bo Tao

Keyword(s):

Power Systems ◽

Clustering Algorithm ◽

Hybrid Power Systems ◽

Cluster Number ◽

Battery System ◽

Optimal Sizing ◽

Hybrid Power ◽

Trade Offs ◽

Fuzzy C Means Clustering ◽

System States

Integrating wind generation, photovoltaic power, and battery storage to form hybrid power systems has been recognized to be promising in renewable energy development. However, considering the system complexity and uncertainty of renewable energies, such as wind and solar types, it is difficult to obtain practical solutions for these systems. In this paper, optimal sizing for a wind/PV/battery system is realized by trade-offs between technical and economic factors. Firstly, the fuzzy c-means clustering algorithm was modified with self-adapted parameters to extract useful information from historical data. Furthermore, the Markov model is combined to determine the chronological system states of natural resources and load. Finally, a power balance strategy is introduced to guide the optimization process with the genetic algorithm to establish the optimal configuration with minimized cost while guaranteeing reliability and environmental factors. A case of island hybrid power system is analyzed, and the simulation results are compared with the general FCM method and chronological method to validate the effectiveness of the mentioned method.

Download Full-text

The Application of Fuzzy Clustering Number Algorithm in Network Intrusion Detection

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.760-762.2220 ◽

2013 ◽

Vol 760-762 ◽

pp. 2220-2223

Author(s):

Lang Guo

Keyword(s):

Intrusion Detection ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Local Optimum ◽

Cluster Number ◽

Data Set ◽

Network Intrusion ◽

Correlation Degree ◽

Indicator Data ◽

Detection Effect

In view of the defects of K-means algorithm in intrusion detection: the need of preassign cluster number and sensitive initial center and easy to fall into local optimum, this paper puts forward a fuzzy clustering algorithm. The fuzzy rules are utilized to express the invasion features, and standardized matrix is adopted to further process so as to reflect the approximation degree or correlation degree between the invasion indicator data and establish a similarity matrix. The simulation results of KDD CUP1999 data set show that the algorithm has better intrusion detection effect and can effectively detect the network intrusion data.

Download Full-text

The FRCK clustering algorithm for determining cluster number and removing outliers automatically

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2021.118097 ◽

2021 ◽

Vol 24 (5) ◽

pp. 485

Author(s):

Yubin Guo ◽

Yuhang Wu ◽

Xiaopeng Zhang ◽

Aofeng Bo ◽

Ximing Li

Keyword(s):

Clustering Algorithm ◽

Cluster Number

Download Full-text

Framework for Automatic Selection of Kernels based on Convolutional Neural Networks and CkMeans Clustering Algorithm

International Journal of Image and Graphics ◽

10.1142/s0219467819500190 ◽

2019 ◽

Vol 19 (04) ◽

pp. 1950019 ◽

Cited By ~ 3

Author(s):

Maissa Hamouda ◽

Karim Saheb Ettabaa ◽

Med Salim Bouhlel

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Clustering Algorithm ◽

Feature Representation ◽

Cluster Number ◽

Classification Framework ◽

Deep Feature ◽

Training Samples ◽

Reliable Classification ◽

Hyperspectral Classification

Convolutional neural networks (CNN) can learn deep feature representation for hyperspectral imagery (HSI) interpretation and attain excellent accuracy of classification if we have many training samples. Due to its superiority in feature representation, several works focus on it, among which a reliable classification approach based on CNN, used filters generated from cluster framework, like k Means algorithm, yielded good results. However, the kernels number to be manually assigned. To solve this problem, a HSI classification framework based on CNN, where the convolutional filters to be adaptatively learned from the data, by grouping without knowing the cluster number, has recently proposed. This framework, based on the two algorithms CNN and kMeans, showed high accuracy results. So, in the same context, we propose an architecture based on the depth convolution al neural networks principle, where kernels are adaptatively learned, using CkMeans network, to generate filters without knowing the number of clusters, for hyperspectral classification. With adaptive kernels, the proposed framework automatic kernels selection by CkMeans algorithm (AKSCCk) achieves a better classification accuracy compared to the previous frameworks. The experimental results show the effectiveness and feasibility of AKSCCk approach.

Download Full-text

A Robust Information Clustering Algorithm

Neural Computation ◽

10.1162/089976605774320548 ◽

2005 ◽

Vol 17 (12) ◽

pp. 2672-2698 ◽

Cited By ~ 21

Author(s):

Qing Song

Keyword(s):

Clustering Algorithm ◽

Risk Minimization ◽

Minimax Optimization ◽

Cluster Number ◽

Minimization Principle ◽

Nonparametric Approach ◽

Empirical Risk ◽

Structural Risk Minimization Principle ◽

Real Risk ◽

Data Points

We focus on the scenario of robust information clustering (RIC) based on the minimax optimization of mutual information (MI). The minimization of MI leads to the standard mass-constrained deterministic annealing clustering, which is an empirical risk-minimization algorithm. The maximization of MI works out an upper bound of the empirical risk via the identification of outliers (noisy data points). Furthermore, we estimate the real risk VC-bound and determine an optimal cluster number of the RIC based on the structural risk-minimization principle. One of the main advantages of the minimax optimization of MI is that it is a nonparametric approach, which identifies the outliers through the robust density estimate and forms a simple data clustering algorithm based on the square error of the Euclidean distance.

Download Full-text

A Quantitative Discriminant Method of Elbow Point for the Optimal Number of Clusters in Clustering Algorithm

10.21203/rs.3.rs-58011/v3 ◽

2021 ◽

Author(s):

Congming Shi ◽

Bingtao Wei ◽

Shoulin Wei ◽

Wen Wang ◽

Hai Liu ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Machine Learning Method ◽

Cluster Number ◽

Number Of Clusters ◽

Public Dataset ◽

Optimal Cluster ◽

Better Than ◽

Optimal Number Of Clusters

Abstract Clustering, a traditional machine learning method, plays a significant role in data analysis. Most clustering algorithms depend on a predetermined exact number of clusters, whereas, in practice, clusters are usually unpredictable. Although the Elbow method is one of the most commonly used methods to discriminate the optimal cluster number, the discriminant of the number of clusters depends on the manual identification of the elbow points on the visualization curve. Thus, experienced analysts cannot clearly identify the elbow point from the plotted curve when the plotted curve is fairly smooth. To solve this problem, a new elbow point discriminant method is proposed to yield a statistical metric that estimates an optimal cluster number when clustering on a dataset. First, the average degree of distortion obtained by the Elbow method is normalized to the range of 0 to 10. Second, the normalized results are used to calculate the cosine of intersection angles between elbow points. Third, this calculated cosine of intersection angles and the arccosine theorem are used to compute the intersection angles between elbow points. Finally, the index of the above computed minimal intersection angles between elbow points is used as the estimated potential optimal cluster number. The experimental results based on simulated datasets and a well-known public dataset (Iris Dataset) demonstrated that the estimated optimal cluster number obtained by our newly proposed method is better than the widely used Silhouette method.

Download Full-text

Comparison of Clustering Algorithms on Air Quality Substances in Peninsular Malaysia

Journal of Computing Research and Innovation ◽

10.24191/jcrinn.v2i1.28 ◽

2018 ◽

Vol 2 (1) ◽

pp. 36-44

Author(s):

Sitti Sufiah Atirah Rosly ◽

Balkiah Moktar ◽

Muhamad Hasbullah Mohd Razali

Keyword(s):

Cluster Analysis ◽

Air Quality ◽

Expectation Maximization ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Peninsular Malaysia ◽

Monthly Data ◽

Knowledge Analysis ◽

Em Clustering ◽

Monitoring Stations

Air quality is one of the most popular environmental problems in this globalization era. Air pollution is the poisonous air that comes from car emissions, smog, open burning, chemicals from factories and other particles and gases. This harmful air can give adverse effects to human health and the environment. In order to provide information which areas are better for the residents in Malaysia, cluster analysis is used to determine the areas that can be clustering together based on their a ir quality through several air quality substances. Monthly data from 37 monitoring stations in Peninsular Malaysia from the year 2013 to 2015 were used in this study. K - Means (KM) clustering algorithm, Expectation Maximization (EM) clustering algorithm and Density Based (DB) clustering algorithm have been chosen as the techniques to analyze the cluster analysis by utilizing the Waikato Environment for Knowledge Analysis (WEKA) tools. Results show that K - means clustering algorithm is the best method among ot her algorithms due to its simplicity and time taken to build the model. The output of K - means clustering algorithm shows that it can cluster the area into two clusters, namely as cluster 0 and cluster 1. Clusters 0 consist of 16 monitoring stations and clu ster 1 consists of 36 monitoring stations in Peninsular Malaysia.

Download Full-text

Clusterdv, a simple density-based clustering method that is robust, general and automatic

10.1101/224840 ◽

2017 ◽

Author(s):

João C. Marques ◽

Michael B. Orger

Keyword(s):

Clustering Algorithm ◽

Underlying Structure ◽

Data Sets ◽

Natural Phenomena ◽

Cluster Number ◽

Data Set ◽

Density Peaks ◽

Wide Range ◽

Cluster Shape ◽

Fully Automatic

AbstractHow to partition a data set into a set of distinct clusters is a ubiquitous and challenging problem. The fact that data varies widely in features such as cluster shape, cluster number, density distribution, background noise, outliers and degree of overlap, makes it difficult to find a single algorithm that can be broadly applied. One recent method, clusterdp, based on search of density peaks, can be applied successfully to cluster many kinds of data, but it is not fully automatic, and fails on some simple data distributions. We propose an alternative approach, clusterdv, which estimates density dips between points, and allows robust determination of cluster number and distribution across a wide range of data, without any manual parameter adjustment. We show that this method is able to solve a range of synthetic and experimental data sets, where the underlying structure is known, and identifies consistent and meaningful clusters in new behavioral data.Author summarIt is common that natural phenomena produce groupings, or clusters, in data, that can reveal the underlying processes. However, the form of these clusters can vary arbitrarily, making it challenging to find a single algorithm that identifies their structure correctly, without prior knowledge of the number of groupings or their distribution. We describe a simple clustering algorithm that is fully automatic and is able to correctly identify the number and shape of groupings in data of many types. We expect this algorithm to be useful in finding unknown natural phenomena present in data from a wide range of scientific fields.

Download Full-text

A novel bidirectional clustering algorithm based on local density

10.21203/rs.3.rs-141525/v1 ◽

2021 ◽

Author(s):

BAICHENG LV ◽

WENHUA WU ◽

ZHIQIANG HU

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

Abstract With the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text