Fuzzyc-Means Clustering for Data with Clusterwise Tolerance Based onL2- andL1-Regularization

Author(s):  
Yukihiro Hamasuna ◽  
◽  
Yasunori Endo ◽  
Sadaaki Miyamoto ◽  

Detecting various kinds of cluster shape is an important problem in the field of clustering. In general, it is difficult to obtain clusters with different sizes or shapes by single-objective function. From that sense, we have proposed the concept of clusterwise tolerance and constructed clustering algorithms based on it. In the field of data mining, regularization techniques are used in order to derive significant classifiers. In this paper, we propose another concept of clusterwise tolerance from the viewpoint of regularization. Moreover, we construct clustering algorithms for data with clusterwise tolerance based onL2- andL1-regularization. After that, we describe fuzzy classification functions of proposed algorithms. Finally, we show the effectiveness of proposed algorithms through numerical examples.

Author(s):  
Yukihiro Hamasuna ◽  
◽  
Yasunori Endo ◽  
Sadaaki Miyamoto ◽  

This paper presents a new type of clustering algorithms by using a tolerance vector called tolerant fuzzyc-means clustering (TFCM). In the proposed algorithms, the new concept of tolerance vector plays very important role. In the original concept of tolerance, a tolerance vector attributes to each data. This concept is developed to handle data flexibly, that is, a tolerance vector attributes not only to each data but also each cluster. Using the new concept, we can consider the influence of clusters to each data by the tolerance. First, the new concept of tolerance is introduced into optimization problems based on conventional fuzzyc-means clustering (FCM). Second, the optimization problems with tolerance are solved by using Karush-Kuhn-Tucker conditions. Third, new clustering algorithms are constructed based on the explicit optimal solutions of the optimization problems. Finally, the effectiveness of the proposed algorithms is verified through numerical examples by fuzzy classification function.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  
Ken Onishi ◽  
◽  
...  

The rough clustering algorithm we proposed based on the optimization of objective function (RCM) has a problem because conventional rough clustering algorithm results do not ensure that solutions are optimal. To solve this problem, we propose rough clustering algorithms based on optimization of an objective function with fuzzy-set representation. This yields more flexible results than RCM. We verify algorithm effectiveness through numerical examples.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  

Clustering is one of the most popular unsupervised classification methods. In this paper, we focus on rough clustering methods based on rough-set representation. Rough k-Means (RKM) is one of the rough clustering method proposed by Lingras et al. Outputs of many clustering algorithms, including RKM depend strongly on initial values, so we must evaluate the validity of outputs. In the case of objectivebased clustering algorithms, the objective function is handled as the measure. It is difficult, however to evaluate the output in RKM, which is not objective-based. To solve this problem, we propose new objective-based rough clustering algorithms and verify theirs usefulness through numerical examples.


Author(s):  
Yuchi Kanzawa ◽  

In this paper, two types of fuzzy co-clustering algorithms are proposed. First, it is shown that the base of the objective function for the conventional fuzzy co-clustering method is very similar to the base for entropy-regularized fuzzy nonmetric model. Next, it is shown that the non-sense clustering problem in the conventional fuzzy co-clustering algorithms is identical to that in fuzzy nonmetric model algorithms, in the case that all dissimilarities among rows and columns are zero. Based on this discussion, a method is proposed applying entropy-regularized fuzzy nonmetric model after all dissimilarities among rows and columns are set to some values using a TIBA imputation technique. Furthermore, since relational fuzzy cmeans is similar to fuzzy nonmetricmodel, in the sense that both methods are designed for homogeneous relational data, a method is proposed applying entropyregularized relational fuzzyc-means after imputing all dissimilarities among rows and columns with TIBA. Some numerical examples are presented for the proposed methods.


2008 ◽  
Vol 41 (5) ◽  
pp. 1824-1833 ◽  
Author(s):  
Hamid Mohamadi ◽  
Jafar Habibi ◽  
Mohammad Saniee Abadeh ◽  
Hamid Saadi

2013 ◽  
Vol 405-408 ◽  
pp. 2222-2225
Author(s):  
Qian Li ◽  
Wei Min Bao ◽  
Jing Lin Qian

This paper discusses the conceptual stepped calibration approach (SCA) which has been developed for the Xinanjiang (XAJ) model. Multi-layer and multi-objective functions which can make optimization work simpler and more effective are introduced in this procedure. In all eight parameters were considered, they were divided into four layers according to the structure of XAJ model, and then calibrated layer by layer. The SCA procedure tends to improve the performance of the traditional method of calibration (thus, using a single objective function, such as root mean square error RMSE). The compared results demonstrate that the SCA yield better model performance than RMSE.


Author(s):  
P. Tamijiselvy ◽  
N. Kavitha ◽  
K. M. Keerthana ◽  
D. Menakha

The degree of aortic calcification has been appeared to be a risk pointer for vascular occasions including cardiovascular events. The created strategy is fully automated data mining algorithm to segment and measure calcification using Low-dose Chest CT in smokers of age 50 to 70 .The identification of subjects with increased cardiovascular risk can be detected by using data mining algorithms. This paper presents a method for automatic detection of coronary artery calcifications in low-dose chest CT scans using effective clustering algorithms with three phases as Pre-Processing, Segmentation and clustering. Fuzzy C Means algorithm provides accuracy of 80.23% demonstrate that Fuzzy C means detects the Cardio Vascular Disease at early stage.


2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2015 ◽  
Vol 16 (SE) ◽  
pp. 133-138
Author(s):  
Mohammad Eiman Jamnezhad ◽  
Reza Fattahi

Clustering is one of the most significant research area in the field of data mining and considered as an important tool in the fast developing information explosion era.Clustering systems are used more and more often in text mining, especially in analyzing texts and to extracting knowledge they contain. Data are grouped into clusters in such a way that the data of the same group are similar and those in other groups are dissimilar. It aims to minimizing intra-class similarity and maximizing inter-class dissimilarity. Clustering is useful to obtain interesting patterns and structures from a large set of data. It can be applied in many areas, namely, DNA analysis, marketing studies, web documents, and classification. This paper aims to study and compare three text documents clustering, namely, k-means, k-medoids, and SOM through F-measure.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Lopamudra Dey ◽  
Sanjay Chakraborty

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.


Sign in / Sign up

Export Citation Format

Share Document