scholarly journals A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Clustering mixed and incomplete data is a goal of frequent approaches in the last years because its common apparition in soft sciences problems. However, there is a lack of studies evaluating the performance of clustering algorithms for such kind of data. In this paper we present an experimental study about performance of seven clustering algorithms which used one of these techniques: partition, hierarchal or metaheuristic. All the methods ran over 15 databases from UCI Machine Learning Repository, having mixed and incomplete data descriptions. In external cluster validation using the indices Entropy and V-Measure, the algorithms that use the last technique showed the best results. Thus, we recommend metaheuristic based clustering algorithms for clustering data having mixed and incomplete descriptions.


2020 ◽  
Vol 13 (2) ◽  
pp. 65-75
Author(s):  
Ridho Ananda ◽  
Atika Ratna Dewi ◽  
Nurlaili Nurlaili

The existence of missing values will really inhibit process of clustering. To overcome it, some of scientists have found several solutions. Both of them are imputation and special clustering algorithms. This paper compared the results of clustering by using them in incomplete data. K-means algorithms was utilized in the imputation data. The algorithms used were distribution free multiple imputation (DFMI), Gabriel eigen (GE), expectation maximization-singular value decomposition (EM-SVD), biplot imputation (BI), four algorithms of modified fuzzy c-means (FCM), k-means soft constraints (KSC), distance estimation strategy fuzzy c-means (DESFCM), k-means soft constraints imputed-observed (KSC-IO). The data used were the 2018 environmental performance index (EPI) and the simulation data. The optimal clustering on the 2018 EPI data would be chosen based on Silhouette index, where previously, it had been tested its capability in simulation dataset. The results showed that Silhouette index have the good capability to validate the clustering results in the incomplete dataset and the optimal clustering in the 2018 EPI dataset was obtained by k-means using BI where the silhouette index and time complexity were 0.613 and 0.063 respectively. Based on the results, k-means by using BI is suggested processing clustering analysis in the 2018 EPI dataset.


2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Jinhua Li ◽  
Shiji Song ◽  
Yuli Zhang ◽  
Zhen Zhou

Incomplete data with missing feature values are prevalent in clustering problems. Traditional clustering methods first estimate the missing values by imputation and then apply the classical clustering algorithms for complete data, such as K-median and K-means. However, in practice, it is often hard to obtain accurate estimation of the missing values, which deteriorates the performance of clustering. To enhance the robustness of clustering algorithms, this paper represents the missing values by interval data and introduces the concept of robust cluster objective function. A minimax robust optimization (RO) formulation is presented to provide clustering results, which are insensitive to estimation errors. To solve the proposed RO problem, we propose robust K-median and K-means clustering algorithms with low time and space complexity. Comparisons and analysis of experimental results on both artificially generated and real-world incomplete data sets validate the robustness and effectiveness of the proposed algorithms.


Author(s):  
Janusz Sobecki

In this paper a comparison of a few swarm intelligence algorithms applied in recommendation of student courses is presented. Swarm intelligence algorithms are nowadays successfully used in many areas, especially in optimization problems. To apply each swarm intelligence algorithm in recommender systems a special representation of the problem space is necessary. Here we present the comparison of efficiency of grade prediction of several evolutionary algorithms, such as: Ant Colony Optimization (ACO), Particle Swarm Optimization (PSO), Intelligent Weed Optimization (IWO), Bee Colony Optimization (BCO) and Bat Algorithm (BA).


2016 ◽  
Vol 26 (4) ◽  
pp. 871-884 ◽  
Author(s):  
Loai Abdallah ◽  
Ilan Shimshoni

AbstractMissing values in data are common in real world applications. There are several methods that deal with this problem. In this paper we present lookahead selective sampling (LSS) algorithms for datasets with missing values. We developed two versions of selective sampling. The first one integrates a distance function that can measure the similarity between pairs of incomplete points within the framework of the LSS algorithm. The second algorithm uses ensemble clustering in order to represent the data in a cluster matrix without missing values and then run the LSS algorithm based on the ensemble clustering instance space (LSS-EC). To construct the cluster matrix, we use the k-means and mean shift clustering algorithms especially modified to deal with incomplete datasets. We tested our algorithms on six standard numerical datasets from different fields. On these datasets we simulated missing values and compared the performance of the LSS and LSS-EC algorithms for incomplete data to two other basic methods. Our experiments show that the suggested selective sampling algorithms outperform the other methods.


Obtaining high quality groups and processing mixed and incomplete data (DMI) are still problems in the data clustering. Recently a method was proposed that improves the results obtained by clustering algorithms, the PAntSA; but this was only designed and tested for numerical data. For this reason, this paper analyzes the influence of applying the PAntSA in the performance of DMI restricted clustering algorithms. For this, the results of different algorithms are compared before and after applying the PAntSA. The comparisons made provide experimental evidence that the PAntSA algorithm improves the quality of the groups obtained by traditional DMI clustering methods.


2021 ◽  
Author(s):  
Meskat Jahan ◽  
Mahmudul Hasan

Abstract In the big data era, clustering is one of the most popular data mining method. The majority of clustering algorithms have complications like automatic cluster number determination, poor clustering precision, inconsistent clustering of various datasets and parameter-dependent etc. A new fuzzy autonomous solution for clustering named Meskat-Mahmudul (MM) clustering algorithm proposed to overcome the complexity of parameter–free automatic cluster number determination and clustering accuracy. MM clustering algorithm finds out the exact number of clusters based on Average Silhouette method in multivariate mixed attribute dataset, including real-time gene expression dataset and dealt missing values, noise and outliers. MM Extended K-Means (MMK) clustering algorithm is an enhancement of the K-Means algorithm, which serves the purpose for automatic cluster discovery and runtime cluster placement. Several validation methods used to evaluate cluster and certify optimum cluster partitioning and perfection. Some datasets used to assess the performance of the proposed algorithms to other algorithms in terms of time complexity and clustering efficiency. Finally, MM clustering and MMK clustering algorithms found superior over conventional algorithms.


2019 ◽  
Vol 8 (2) ◽  
pp. 3323-3327

The software has many features like functionality, maintainability, serviceability, usability, quality, performance. The reliability of the software is an imperative characteristic of software that leads to the eminence of the software. Software reliability is a great concern for software producers as well as users of the software. Keeping this concern in mind, there are already hundreds of software reliability models developed in the last four decades. This paper evaluates different algorithms based on Swarm intelligence in the way of optimization in software reliability. There are a number of swarm intelligence based algorithms that already have been used to improve the efficiency of the reliability of the software. Some of them are ant colony optimizer method (ACO), particle swarm optimizer method (PSO), artificial bee colony optimizer (ABC), bat algorithm, fish swarm algorithm, cuckoo search, bird flock algorithm. Still, there are so many algorithms based on Swarm intelligence that has not been used in this area. This paper investigates some known swarm intelligence based algorithms and their applications for optimizing software reliability.


Author(s):  
Nibras Othman Abdul Wahid ◽  
Saif Aamer Fadhil ◽  
Noor Abbood Jasim

Unsupervised data clustering investigation is a standout amongst the most valuable apparatuses and an enlightening undertaking in data mining that looks to characterize homogeneous gatherings of articles depending on likeness and is utilized in numerous applications. One of the key issues in data mining is clustering data that have pulled in much consideration. One of the famous clustering algorithms is K-means clustering that has been effectively connected to numerous issues. Scientists recommended enhancing the nature of K-means, optimization algorithms were hybridized. In this paper, a heuristic calculation, Lion Optimization Algorithm (LOA), and Genetic Algorithm (GA) were adjusted for K-Means data clustering by altering the fundamental parameters of LOA calculation, which is propelled from the characteristic enlivened calculations. The uncommon way of life of lions and their participation attributes has been the essential inspiration for the advancement of this improvement calculation. The GA is utilized when it is required to reallocate the clusters using the genetic operators, crossover, and mutation. The outcomes of the examination of this calculation mirror the capacity of this methodology in clustering examination on the number of benchmark datasets from UCI Machine Learning Repository.


Sign in / Sign up

Export Citation Format

Share Document