scholarly journals Big Data Clustering Using Improvised Fuzzy C-Means Clustering

2020 ◽  
Vol 34 (6) ◽  
pp. 701-708
Author(s):  
Venkat Rayala ◽  
Satyanarayan Reddy Kalli

Clustering emerged as powerful mechanism to analyze the massive data generated by modern applications; the main aim of it is to categorize the data into clusters where objects are grouped into the particular category. However, there are various challenges while clustering the big data recently. Deep Learning has been powerful paradigm for big data analysis, this requires huge number of samples for training the model, which is time consuming and expensive. This can be avoided though fuzzy approach. In this research work, we design and develop an Improvised Fuzzy C-Means (IFCM)which comprises the encoder decoder Convolutional Neural Network (CNN) model and Fuzzy C-means (FCM) technique to enhance the clustering mechanism. Encoder decoder based CNN is used for learning feature and faster computation. In general, FCM, we introduce a function which measure the distance between the cluster center and instance which helps in achieving the better clustering and later we introduce Optimized Encoder Decoder (OED) CNN model for improvising the performance and for faster computation. Further in order to evaluate the proposed mechanism, three distinctive data types namely Modified National Institute of Standards and Technology (MNIST), fashion MNIST and United States Postal Service (USPS) are used, also evaluation is carried out by considering the performance metric like Accuracy, Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). Moreover, comparative analysis is carried out on each dataset and comparative analysis shows that IFCM outperforms the existing model.

2013 ◽  
Vol 765-767 ◽  
pp. 670-673
Author(s):  
Li Bo Hou

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.


Author(s):  
Joshua Devadason ◽  
◽  
Rehan Akbar

Big data is a valuable asset for organisation as it analyses and help to understand the customers, changes within their business environment, market analysis and future trends. The big data is multifaceted (different data types and versatile), and mostly exists in unstructured formats. The extraction of value from this data is challenging. The usability and productivity of this multifaceted unstructured data is greatly compromised. A number of factors and associated reasons affect the usability of unstructured big data. The present research work investigates these factors and associated reasons behind the usability issues of multifaceted unstructured big data. The identification of these factors contribute to develop solutions to reduce the lack of usability of highly unstructured big data. A detailed study of existing literature followed by survey questionnaire has been conducted to identify the factors and their reasons. Descriptive statistics has been used to analyse and interpret the data and results.


Author(s):  
Sanat Kumar Sahu ◽  
A. K. Shrivas

The purpose of this article is to weigh up the foremost imperative features of Chronic Kidney Disease (CKD). This study is based mostly on three cluster techniques like; K means, Fuzzy c-means and hierarchical clustering. The authors used evolutionary techniques like genetic algorithms (GA) to extend the performance of the clustering model. The performance of these three clusters: live parameter purity, entropy, and Adjusted Rand Index (ARI) have been contemplated. The best purity is obtained by the K-means clustering technique, 96.50%; whereas, Fuzzy C-means clustering received 93.50% and hierarchical clustering was the lowest at 92. 25%. After using evolutionary technique Genetic Algorithm as Feature selection technique, the best purity is obtained by hierarchical clustering, 97.50%, compared to K –means clustering, 96.75%, and Fuzzy C-means clustering at 94.00%.


Author(s):  
Sanat Kumar Sahu ◽  
A. K. Shrivas

The purpose of this article is to weigh up the foremost imperative features of Chronic Kidney Disease (CKD). This study is based mostly on three cluster techniques like; K means, Fuzzy c-means and hierarchical clustering. The authors used evolutionary techniques like genetic algorithms (GA) to extend the performance of the clustering model. The performance of these three clusters: live parameter purity, entropy, and Adjusted Rand Index (ARI) have been contemplated. The best purity is obtained by the K-means clustering technique, 96.50%; whereas, Fuzzy C-means clustering received 93.50% and hierarchical clustering was the lowest at 92. 25%. After using evolutionary technique Genetic Algorithm as Feature selection technique, the best purity is obtained by hierarchical clustering, 97.50%, compared to K –means clustering, 96.75%, and Fuzzy C-means clustering at 94.00%.


2020 ◽  
Vol 39 (2) ◽  
pp. 1619-1626
Author(s):  
Yongsheng Zong ◽  
Guoyan Huang

For the unsupervised learning based clustering algorithm, the intrusion detection rate is low, and the training sample based on supervised learning clustering algorithm is insufficient. A semi-supervised kernel fuzzy C-means clustering algorithm based on artificial fish swarm optimization (AFSA-KFCM) is proposed. Firstly, the kernel function is used to change the distance function in the traditional semi-supervised fuzzy C-means clustering algorithm to define a new objective function, thus improving the probabilistic constraints of the fuzzy C-means algorithm. Then, the artificial fish swarm algorithm with strong global optimization ability is used to improve the KFCM sensitivity to the initial cluster center and easy to fall into the local extremum, thus improving the convergence speed and improving the classification effect. The test results in the Wine and IRIS public datasets show that the AFSA-KFCM clustering algorithm is superior to the traditional algorithm in clustering accuracy and time efficiency. At the same time, the experimental results in KDDCUP99 experimental data show that the algorithm can obtain the ideal detection rate and false detection rate in intrusion detection.


2014 ◽  
Vol 986-987 ◽  
pp. 206-210 ◽  
Author(s):  
Rui Dong ◽  
Min Xiang Huang

FCM is used in many power load classification currently, but it also has some shortcomings. This paper give an algorithm based on Subtractive Clustering and improved Fuzzy C-means Clustering (SUB-FCM) to solve this problem. This algorithm use subtractive clustering to initialize the cluster center matrix, solve the random initialization of FCM, and improve the global search ability, avoid falling into local optima. Experimental analysis found this algorithm also could accelerate the convergence speed, and has better clustering results. It can be applied to power load classification effectively.


2020 ◽  
Vol 10 (3) ◽  
pp. 579-585
Author(s):  
Hui Zhang ◽  
Hongjie Zhang

Accurate segmentation of brain tissue has important guiding significance and practical application value for the diagnosis of brain diseases. Brain magnetic resonance imaging (MRI) has the characteristics of high dimensionality and large sample size. Such datasets create considerable computational complexity in image processing. To efficiently process large sample data, this article integrates the proposed block clustering strategy with the classic fuzzy C-means clustering (FCM) algorithm and proposes a block-based integrated FCM clustering algorithm (BI-FCM). The algorithm first performs block processing on each image and then clusters each subimage using the FCM algorithm. The cluster centers for all subimages are again clustered using FCM to obtain the final cluster center. Finally, the distance from each pixel to the final cluster center is obtained, and the corresponding division is performed according to the distance. The dataset used in this experiment is the Simulated Brain Database (SBD). The results show that the BI-FCM algorithm addresses the large sample processing problem well, and the theory is simple and effective.


Sign in / Sign up

Export Citation Format

Share Document