scholarly journals A Three-Way Clustering Method Based on Ensemble Strategy and Three-Way Decision

Information ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 59 ◽  
Author(s):  
Pingxin Wang ◽  
Qiang Liu ◽  
Gang Xu ◽  
Kangkang Wang

Three-way decision is a class of effective ways and heuristics commonly used in human problem solving and information processing. As an application of three-way decision in clustering, three-way clustering uses core region and fringe region to represent a cluster. The identified elements are assigned into the core region and the uncertain elements are assigned into the fringe region in order to reduce decision risk. In this paper, we propose a three-way clustering algorithm based on the ideas of cluster ensemble and three-way decision. In the proposed method, we use hard clustering methods to produce different clustering results and labels matching to align all clustering results to a given order. The intersection of the clusters with the same labels are regarded as the core region. The difference between the union and the intersection of the clusters with the same labels are regarded as the fringe region of the specific cluster. Therefore, a three-way clustering is naturally formed. The results on UCI data sets show that such a strategy is effective in improving the structure of clustering results.

Author(s):  
C. James Li ◽  
C. Jansuwan

Projection network, being a non-linear dynamic system itself, has been shown to be superior to static classifiers such as neural networks in some applications where noise is significant. However it is a supervised classifier by nature. To extend its utility for unsupervised classification, this study proposes an unsupervised pattern classifier integrating a clustering algorithm based on DBSCAN and a dynamic classifier based on the projection network. The former is used to form clusters out of un-labeled data and eliminate outliers. Then, significant clusters in terms of size are identified. Subsequently, a system of projection networks is established to recognize all the significant clusters. The unsupervised classifier is tested with three well-known benchmark data sets (by ignoring data labels during training) including the Fisher’s iris data, the heart disease data and the credit screening data and the results are compared to those of supervised classifiers based on the projection network. The difference in performance is small. However, the ability of unsupervised classification comes at a price of a more complex classifier system and the need of data pre-conditioning. The former is because more than one cluster could be formed for a class and therefore more computational units are needed for the classifier, and the latter is because increased similarity of data after clustering increases the chances of numerical instability in the least square algorithm used to initialize the classifier.


Author(s):  
Yasunori Endo ◽  
◽  
Tomoyuki Suzuki ◽  
Naohiko Kinoshita ◽  
Yukihiro Hamasuna ◽  
...  

The fuzzy non-metric model (FNM) is a representative non-hierarchical clustering method, which is very useful because the belongingness or the membership degree of each datum to each cluster can be calculated directly from the dissimilarities between data and the cluster centers are not used. However, the original FNM cannot handle data with uncertainty. In this study, we refer to the data with uncertainty as “uncertain data,” e.g., incomplete data or data that have errors. Previously, a methods was proposed based on the concept of a tolerance vector for handling uncertain data and some clustering methods were constructed according to this concept, e.g. fuzzyc-means for data with tolerance. These methods can handle uncertain data in the framework of optimization. Thus, in the present study, we apply the concept to FNM. First, we propose a new clustering algorithm based on FNM using the concept of tolerance, which we refer to as the fuzzy non-metric model for data with tolerance. Second, we show that the proposed algorithm can handle incomplete data sets. Third, we verify the effectiveness of the proposed algorithm based on comparisons with conventional methods for incomplete data sets in some numerical examples.


2007 ◽  
Vol 17 (01) ◽  
pp. 71-103 ◽  
Author(s):  
NARGESS MEMARSADEGHI ◽  
DAVID M. MOUNT ◽  
NATHAN S. NETANYAHU ◽  
JACQUELINE LE MOIGNE

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.


2014 ◽  
Vol 2014 ◽  
pp. 1-7 ◽  
Author(s):  
Chao Tong ◽  
Jianwei Niu ◽  
Bin Dai ◽  
Zhongyu Xie

In complex networks, cluster structure, identified by the heterogeneity of nodes, has become a common and important topological property. Network clustering methods are thus significant for the study of complex networks. Currently, many typical clustering algorithms have some weakness like inaccuracy and slow convergence. In this paper, we propose a clustering algorithm by calculating the core influence of nodes. The clustering process is a simulation of the process of cluster formation in sociology. The algorithm detects the nodes with core influence through their betweenness centrality, and builds the cluster’s core structure by discriminant functions. Next, the algorithm gets the final cluster structure after clustering the rest of the nodes in the network by optimizing method. Experiments on different datasets show that the clustering accuracy of this algorithm is superior to the classical clustering algorithm (Fast-Newman algorithm). It clusters faster and plays a positive role in revealing the real cluster structure of complex networks precisely.


Holzforschung ◽  
2018 ◽  
Vol 72 (12) ◽  
pp. 1051-1056 ◽  
Author(s):  
Ignacio Bobadilla ◽  
Roberto D. Martínez ◽  
Miguel Esteban ◽  
Daniel F. Llana

AbstractDensity estimation by non-destructive or semi-destructive methods is applied mainly on softwood species. The instruments are expensive, the methods are complicated and the determination coefficients are low. In the present study, the simple core hollow drilling approach is revisited. Data of 600 cores or cylindrical specimens from 300 pieces of 10 different softwood and hardwood species were evaluated in the density range from 350 to 975 kg m−3. The data were obtained from complete pieces and from the cores from core drilling, while the difference between the two data sets is 1.7%. At higher densities, the differences are greater. A model was proposed concerning the piece density estimation with a determination coefficient of 0.98. It is concluded that core drill is a cheap and reliable method for density estimation and the data are equally reliable for radial (R) or tangential (T) probing. The cylindrical cores obtained are suitable for moisture content (MC) and species determination.


2013 ◽  
Vol 2013 ◽  
pp. 1-9 ◽  
Author(s):  
Qin Wu ◽  
Xingqin Qi ◽  
Eddie Fuller ◽  
Cun-Quan Zhang

Within graph theory and network analysis, centrality of a vertex measures the relative importance of a vertex within a graph. The centrality plays key role in network analysis and has been widely studied using different methods. Inspired by the idea of vertex centrality, a novel centrality guided clustering (CGC) is proposed in this paper. Different from traditional clustering methods which usually choose the initial center of a cluster randomly, the CGC clustering algorithm starts from a “LEADER”—a vertex with the highest centrality score—and a new “member” is added into the same cluster as the “LEADER” when some criterion is satisfied. The CGC algorithm also supports overlapping membership. Experiments on three benchmark social network data sets are presented and the results indicate that the proposed CGC algorithm works well in social network clustering.


2001 ◽  
Vol 118 (2) ◽  
pp. 183-192 ◽  
Author(s):  
Holger Möttig ◽  
Jana Kusch ◽  
Thomas Zimmer ◽  
Annette Scholle ◽  
Klaus Benndorf

The α subunits of CNG channels of retinal photoreceptors (rod) and olfactory neurons (olf) are proteins that consist of a cytoplasmic NH2 terminus, a transmembrane core region (including the segments S1–S6), and a cytoplasmic COOH terminus. The COOH terminus contains a cyclic nucleotide monophosphate binding domain NBD) that is linked by the C-linker (CL) to the core region. The binding of cyclic nucleotides to the NBD promotes channel opening by an allosteric mechanism. We examined why the sensitivity to cGMP is 22 times higher in olf than in rod by constructing chimeric channels and determining the [cGMP] causing half maximum channel activity (EC50). The characteristic difference in the EC50 value between rod and olf was introduced by the NH2 terminus and the core-CL region, whereas the NBD showed a paradoxical effect. The difference of the free energy difference Δ(ΔG) was determined for each of these three regions with all possible combinations of the other two regions. For rod regions with respect to corresponding olf regions, the open channel conformation was destabilized by the NH2 terminus (Δ(ΔG) = −1.0 to −2.0 RT) and the core-CL region (Δ(ΔG) = −2.0 to −2.9 RT), whereas it was stabilized by the NBD (Δ(ΔG) = 0.3 to 1.1 RT). The NH2 terminus deletion mutants of rod and olf differed by Δ(ΔG) of only 0.9 RT, whereas the wild-type channels differed by the much larger value of 3.1 RT. The results show that in rod and olf, the NH2 terminus, the core-CL region, and the NBD differ by characteristic Δ(ΔG) values that do not depend on the specific composition of the other two regions and that the NH2 terminus generates the main portion of Δ(ΔG) between the wild-type channels.


Author(s):  
Yu-Chiun Chiou ◽  
Shih-Ta Chou

This paper proposes three ant clustering algorithms (ACAs): ACA-1, ACA-2 and ACA-3. The core logic of the proposed ACAs is to modify the ant colony metaheuristic by reformulating the clustering problem into a network problem. For a clustering problem of N objects and K clusters, a fully connected network of N nodes is formed with link costs, representing the dissimilarity of any two nodes it connects. K ants are then to collect their own nodes according to the link costs and following the pheromone trail laid by previous ants. The proposed three ACAs have been validated on a small-scale problem solved by a total enumeration method. The solution effectiveness at different problem scales consistently shows that ACA-2 outperforms among these three ACAs. A further comparison of ACA-2 with other commonly used clustering methods, including agglomerative hierarchy clustering algorithm (AHCA), K-means algorithm (KMA) and genetic clustering algorithm (GCA), shows that ACA-2 significantly outperforms them in solution effectiveness for the most of cases and also performs considerably better in solution stability as the problem scales or the number of clusters gets larger.


Author(s):  
Shofwatul Uyun ◽  
Subanar Subanar

AbstractCluster analysis can be defined as identifying groups of similar objects to discover distribution of patterns and interesting correlations in large data sets. Clustering analysis is important in the fields of pattern recognition and pattern classification. Over the years many methods have been developed for clustering data. In general, clustering methods can be categoried into two categories, i.e., fuzzy clustering and hard clustering. Fuzzy C-means is one of many methods of clustering based on fuzzy approach, while K-Means and K-Medoid are methods clustering based on crisp approach.This study aims to apply Fuzzy C-Means, K-Means and K-Medoid methods for clustering stock data in a jbod and beverage company. The main goal is to find a clustering method that can produce optimal clusters, The resulting clusters are validated using Dunn'• Index (DI). It is expected that the result of this reseach can be used to support decision making in the food and beverage company.Keywords : Clustering, Fuzzy C-Means, K-Means, K-Medoid, Cluster Validity, Dunn's Index (Dl)


2018 ◽  
Vol 29 (1) ◽  
pp. 814-830 ◽  
Author(s):  
Hasan Rashaideh ◽  
Ahmad Sawaie ◽  
Mohammed Azmi Al-Betar ◽  
Laith Mohammad Abualigah ◽  
Mohammed M. Al-laham ◽  
...  

Abstract Text clustering problem (TCP) is a leading process in many key areas such as information retrieval, text mining, and natural language processing. This presents the need for a potent document clustering algorithm that can be used effectively to navigate, summarize, and arrange information to congregate large data sets. This paper encompasses an adaptation of the grey wolf optimizer (GWO) for TCP, referred to as TCP-GWO. The TCP demands a degree of accuracy beyond that which is possible with metaheuristic swarm-based algorithms. The main issue to be addressed is how to split text documents on the basis of GWO into homogeneous clusters that are sufficiently precise and functional. Specifically, TCP-GWO, or referred to as the document clustering algorithm, used the average distance of documents to the cluster centroid (ADDC) as an objective function to repeatedly optimize the distance between the clusters of the documents. The accuracy and efficiency of the proposed TCP-GWO was demonstrated on a sufficiently large number of documents of variable sizes, documents that were randomly selected from a set of six publicly available data sets. Documents of high complexity were also included in the evaluation process to assess the recall detection rate of the document clustering algorithm. The experimental results for a test set of over a part of 1300 documents showed that failure to correctly cluster a document occurred in less than 20% of cases with a recall rate of more than 65% for a highly complex data set. The high F-measure rate and ability to cluster documents in an effective manner are important advances resulting from this research. The proposed TCP-GWO method was compared to the other well-established text clustering methods using randomly selected data sets. Interestingly, TCP-GWO outperforms the comparative methods in terms of precision, recall, and F-measure rates. In a nutshell, the results illustrate that the proposed TCP-GWO is able to excel compared to the other comparative clustering methods in terms of measurement criteria, whereby more than 55% of the documents were correctly clustered with a high level of accuracy.


Sign in / Sign up

Export Citation Format

Share Document