Big Data Clustering Using Improvised Fuzzy C-Means Clustering

Venkat Rayala; Satyanarayan Reddy Kalli

doi:10.18280/ria.340604

Big Data Clustering Using Improvised Fuzzy C-Means Clustering

Revue d intelligence artificielle ◽

10.18280/ria.340604 ◽

2020 ◽

Vol 34 (6) ◽

pp. 701-708

Author(s):

Venkat Rayala ◽

Satyanarayan Reddy Kalli

Keyword(s):

Big Data ◽

Comparative Analysis ◽

Research Work ◽

Cluster Center ◽

Adjusted Rand Index ◽

Postal Service ◽

Data Types ◽

Normalized Mutual Information ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Clustering emerged as powerful mechanism to analyze the massive data generated by modern applications; the main aim of it is to categorize the data into clusters where objects are grouped into the particular category. However, there are various challenges while clustering the big data recently. Deep Learning has been powerful paradigm for big data analysis, this requires huge number of samples for training the model, which is time consuming and expensive. This can be avoided though fuzzy approach. In this research work, we design and develop an Improvised Fuzzy C-Means (IFCM)which comprises the encoder decoder Convolutional Neural Network (CNN) model and Fuzzy C-means (FCM) technique to enhance the clustering mechanism. Encoder decoder based CNN is used for learning feature and faster computation. In general, FCM, we introduce a function which measure the distance between the cluster center and instance which helps in achieving the better clustering and later we introduce Optimized Encoder Decoder (OED) CNN model for improvising the performance and for faster computation. Further in order to evaluate the proposed mechanism, three distinctive data types namely Modified National Institute of Standards and Technology (MNIST), fashion MNIST and United States Postal Service (USPS) are used, also evaluation is carried out by considering the performance metric like Accuracy, Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). Moreover, comparative analysis is carried out on each dataset and comparative analysis shows that IFCM outperforms the existing model.

Download Full-text

Improved Fuzzy FCM-LI Algorithm

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.765-767.670 ◽

2013 ◽

Vol 765-767 ◽

pp. 670-673

Author(s):

Li Bo Hou

Keyword(s):

Real Time ◽

Clustering Algorithm ◽

Feature Analysis ◽

Cluster Center ◽

High Dimensional ◽

Fuzzy C Means ◽

Sample Data ◽

Fuzzy C Means Clustering ◽

Fcm Clustering ◽

Np Hard Problem

Fuzzy C-means (FCM) clustering algorithm is one of the widely applied algorithms in non-supervision of pattern recognition. However, FCM algorithm in the iterative process requires a lot of calculations, especially when feature vectors has high-dimensional, Use clustering algorithm to sub-heap, not only inefficient, but also may lead to "the curse of dimensionality." For the problem, This paper analyzes the fuzzy C-means clustering algorithm in high dimensional feature of the process, the problem of cluster center is an np-hard problem, In order to improve the effectiveness and Real-time of fuzzy C-means clustering algorithm in high dimensional feature analysis, Combination of landmark isometric (L-ISOMAP) algorithm, Proposed improved algorithm FCM-LI. Preliminary analysis of the samples, Use clustering results and the correlation of sample data, using landmark isometric (L-ISOMAP) algorithm to reduce the dimension, further analysis on the basis, obtained the final results. Finally, experimental results show that the effectiveness and Real-time of FCM-LI algorithm in high dimensional feature analysis.

Download Full-text

Factors Affecting The Usability Of Unstructured Big Data

Journal of Independent Studies and Research - Computing ◽

10.31645/09 ◽

2020 ◽

Author(s):

Joshua Devadason ◽

◽

Rehan Akbar

Keyword(s):

Big Data ◽

Research Work ◽

Business Environment ◽

Market Analysis ◽

Unstructured Data ◽

Future Trends ◽

Survey Questionnaire ◽

Data Types ◽

Factors Affecting ◽

Number Of Factors

Big data is a valuable asset for organisation as it analyses and help to understand the customers, changes within their business environment, market analysis and future trends. The big data is multifaceted (different data types and versatile), and mostly exists in unstructured formats. The extraction of value from this data is challenging. The usability and productivity of this multifaceted unstructured data is greatly compromised. A number of factors and associated reasons affect the usability of unstructured big data. The present research work investigates these factors and associated reasons behind the usability issues of multifaceted unstructured big data. The identification of these factors contribute to develop solutions to reduce the lack of usability of highly unstructured big data. A detailed study of existing literature followed by survey questionnaire has been conducted to identify the factors and their reasons. Descriptive statistics has been used to analyse and interpret the data and results.

Download Full-text

Big-data driven building retrofitting: An integrated Support Vector Machines and Fuzzy C-means clustering method

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/588/4/042013 ◽

2020 ◽

Vol 588 ◽

pp. 042013

Author(s):

Weizhuo Lu ◽

Kailun Feng

Keyword(s):

Big Data ◽

Support Vector Machines ◽

Data Driven ◽

Support Vector ◽

Clustering Method ◽

Fuzzy C Means ◽

Vector Machines ◽

Fuzzy C Means Clustering

Download Full-text

Analysis and Comparison of Clustering Techniques for Chronic Kidney Disease With Genetic Algorithm

Research Anthology on Multi-Industry Uses of Genetic Programming and Algorithms ◽

10.4018/978-1-7998-8048-6.ch046 ◽

2021 ◽

pp. 917-927

Author(s):

Sanat Kumar Sahu ◽

A. K. Shrivas

Keyword(s):

Genetic Algorithm ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Hierarchical Clustering ◽

Adjusted Rand Index ◽

Feature Selection Technique ◽

Fuzzy C Means ◽

Clustering Model ◽

Fuzzy C Means Clustering ◽

Evolutionary Technique

The purpose of this article is to weigh up the foremost imperative features of Chronic Kidney Disease (CKD). This study is based mostly on three cluster techniques like; K means, Fuzzy c-means and hierarchical clustering. The authors used evolutionary techniques like genetic algorithms (GA) to extend the performance of the clustering model. The performance of these three clusters: live parameter purity, entropy, and Adjusted Rand Index (ARI) have been contemplated. The best purity is obtained by the K-means clustering technique, 96.50%; whereas, Fuzzy C-means clustering received 93.50% and hierarchical clustering was the lowest at 92. 25%. After using evolutionary technique Genetic Algorithm as Feature selection technique, the best purity is obtained by hierarchical clustering, 97.50%, compared to K –means clustering, 96.75%, and Fuzzy C-means clustering at 94.00%.

Download Full-text

Analysis and Comparison of Clustering Techniques for Chronic Kidney Disease With Genetic Algorithm

International Journal of Computer Vision and Image Processing ◽

10.4018/ijcvip.2018100102 ◽

2018 ◽

Vol 8 (4) ◽

pp. 16-25

Author(s):

Sanat Kumar Sahu ◽

A. K. Shrivas

Keyword(s):

Genetic Algorithm ◽

Chronic Kidney Disease ◽

Kidney Disease ◽

Hierarchical Clustering ◽

Adjusted Rand Index ◽

Feature Selection Technique ◽

Fuzzy C Means ◽

Clustering Model ◽

Fuzzy C Means Clustering ◽

Evolutionary Technique

Download Full-text

Tailoring Fuzzy C-Means Clustering Algorithm for Big Data Using Random Sampling and Particle Swarm Optimization

International Journal of Database Theory and Application ◽

10.14257/ijdta.2015.8.3.16 ◽

2015 ◽

Vol 8 (3) ◽

pp. 191-202 ◽

Cited By ~ 8

Author(s):

Yang Xianfeng ◽

Liu Pengfei

Keyword(s):

Big Data ◽

Particle Swarm Optimization ◽

Random Sampling ◽

Clustering Algorithm ◽

Particle Swarm ◽

Swarm Optimization ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Download Full-text

Application of artificial fish swarm optimization semi-supervised kernel fuzzy clustering algorithm in network intrusion

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-179935 ◽

2020 ◽

Vol 39 (2) ◽

pp. 1619-1626

Author(s):

Yongsheng Zong ◽

Guoyan Huang

Keyword(s):

Intrusion Detection ◽

Detection Rate ◽

Clustering Algorithm ◽

Probabilistic Constraints ◽

Cluster Center ◽

Swarm Optimization ◽

Fuzzy C Means ◽

Network Intrusion ◽

Fuzzy C Means Clustering ◽

Artificial Fish Swarm

For the unsupervised learning based clustering algorithm, the intrusion detection rate is low, and the training sample based on supervised learning clustering algorithm is insufficient. A semi-supervised kernel fuzzy C-means clustering algorithm based on artificial fish swarm optimization (AFSA-KFCM) is proposed. Firstly, the kernel function is used to change the distance function in the traditional semi-supervised fuzzy C-means clustering algorithm to define a new objective function, thus improving the probabilistic constraints of the fuzzy C-means algorithm. Then, the artificial fish swarm algorithm with strong global optimization ability is used to improve the KFCM sensitivity to the initial cluster center and easy to fall into the local extremum, thus improving the convergence speed and improving the classification effect. The test results in the Wine and IRIS public datasets show that the AFSA-KFCM clustering algorithm is superior to the traditional algorithm in clustering accuracy and time efficiency. At the same time, the experimental results in KDDCUP99 experimental data show that the algorithm can obtain the ideal detection rate and false detection rate in intrusion detection.

Download Full-text

Optimal Fuzzy C-means Clustering Technique for Big Data Analytics with Map Reduce based on Hybrid Optimization Algorithm

Journal of Advanced Research in Dynamical and Control Systems ◽

10.5373/jardcs/v11sp10/20192975 ◽

2019 ◽

Vol 11 (10-SPECIAL ISSUE) ◽

pp. 1298-1310

Author(s):

Dandugala Lakshmi Srinivasulu ◽

K. Suvarna Vani

Keyword(s):

Big Data ◽

Optimization Algorithm ◽

Data Analytics ◽

Big Data Analytics ◽

Hybrid Optimization ◽

Map Reduce ◽

Clustering Technique ◽

Fuzzy C Means ◽

Hybrid Optimization Algorithm ◽

Fuzzy C Means Clustering

Download Full-text

An Improved FCM Algorithm Based on Subtractive Clustering for Power Load Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.986-987.206 ◽

2014 ◽

Vol 986-987 ◽

pp. 206-210 ◽

Cited By ~ 1

Author(s):

Rui Dong ◽

Min Xiang Huang

Keyword(s):

Experimental Analysis ◽

Global Search ◽

Cluster Center ◽

Subtractive Clustering ◽

Local Optima ◽

Fcm Algorithm ◽

Fuzzy C Means ◽

Power Load ◽

Fuzzy C Means Clustering ◽

Random Initialization

FCM is used in many power load classification currently, but it also has some shortcomings. This paper give an algorithm based on Subtractive Clustering and improved Fuzzy C-means Clustering (SUB-FCM) to solve this problem. This algorithm use subtractive clustering to initialize the cluster center matrix, solve the random initialization of FCM, and improve the global search ability, avoid falling into local optima. Experimental analysis found this algorithm also could accelerate the convergence speed, and has better clustering results. It can be applied to power load classification effectively.

Download Full-text

A Novel Segmentation Method for Brain MRI Using a Block-Based Integrated Fuzzy C-Means Clustering Algorithm

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2020.2970 ◽

2020 ◽

Vol 10 (3) ◽

pp. 579-585

Author(s):

Hui Zhang ◽

Hongjie Zhang

Keyword(s):

Clustering Algorithm ◽

Brain Diseases ◽

Cluster Center ◽

Large Sample Size ◽

Large Sample ◽

Fcm Algorithm ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering ◽

Final Cluster ◽

Block Based

Accurate segmentation of brain tissue has important guiding significance and practical application value for the diagnosis of brain diseases. Brain magnetic resonance imaging (MRI) has the characteristics of high dimensionality and large sample size. Such datasets create considerable computational complexity in image processing. To efficiently process large sample data, this article integrates the proposed block clustering strategy with the classic fuzzy C-means clustering (FCM) algorithm and proposes a block-based integrated FCM clustering algorithm (BI-FCM). The algorithm first performs block processing on each image and then clusters each subimage using the FCM algorithm. The cluster centers for all subimages are again clustered using FCM to obtain the final cluster center. Finally, the distance from each pixel to the final cluster center is obtained, and the corresponding division is performed according to the distance. The dataset used in this experiment is the Simulated Brain Database (SBD). The results show that the BI-FCM algorithm addresses the large sample processing problem well, and the theory is simple and effective.

Download Full-text