scholarly journals Exploring centroids initialization within Deep Convolutional Embedded Clustering

2019 ◽  
Author(s):  
Leonardo Nogueira ◽  
Adriane Serapião

Deep clustering uses a deep neural network to learn deep feature representation for performing clustering tasks. In this paper, we explored the Deep Convolutional Embedded Clustering (DCEC) method, which employs a stan- dart clustering method to get initial weight for the neural model training incor- porated to other clustering methods. The original DCEC uses K-Means with Euclidean distance for the clusters center initialization step. We have applied K-Means with Mahalanobis distance instead of Euclidean distance. In order to improve the DCEC performance, we have included the standart K-Harmonic Means clustering algorithm as well, which tries overcome the dependency of the K-Means performance on the clusters center initialization. The Kernel ba- sed K-Harmonic Means was also introduced in this study to reduce the effect of outliers and noise. We evaluated the performance of these clustering appro- aches within DCEC over benchmark image datasets and the results were better than the baseline.

2019 ◽  
Vol 19 (04) ◽  
pp. 1950019 ◽  
Author(s):  
Maissa Hamouda ◽  
Karim Saheb Ettabaa ◽  
Med Salim Bouhlel

Convolutional neural networks (CNN) can learn deep feature representation for hyperspectral imagery (HSI) interpretation and attain excellent accuracy of classification if we have many training samples. Due to its superiority in feature representation, several works focus on it, among which a reliable classification approach based on CNN, used filters generated from cluster framework, like k Means algorithm, yielded good results. However, the kernels number to be manually assigned. To solve this problem, a HSI classification framework based on CNN, where the convolutional filters to be adaptatively learned from the data, by grouping without knowing the cluster number, has recently proposed. This framework, based on the two algorithms CNN and kMeans, showed high accuracy results. So, in the same context, we propose an architecture based on the depth convolution al neural networks principle, where kernels are adaptatively learned, using CkMeans network, to generate filters without knowing the number of clusters, for hyperspectral classification. With adaptive kernels, the proposed framework automatic kernels selection by CkMeans algorithm (AKSCCk) achieves a better classification accuracy compared to the previous frameworks. The experimental results show the effectiveness and feasibility of AKSCCk approach.


2018 ◽  
Vol 140 (8) ◽  
Author(s):  
Jeffrey W. Herrmann ◽  
Michael Morency ◽  
Azrah Anparasan ◽  
Erica L. Gralla

Understanding how humans decompose design problems will yield insights that can be applied to develop better support for human designers. However, there are few established methods for identifying the decompositions that human designers use. This paper discusses a method for identifying subproblems by analyzing when design variables were discussed concurrently by human designers. Four clustering techniques for grouping design variables were tested on a range of synthetic datasets designed to resemble data collected from design teams, and the accuracy of the clusters created by each algorithm was evaluated. A spectral clustering method was accurate for most problems and generally performed better than hierarchical (with Euclidean distance metric), Markov, or association rule clustering methods. The method's success should enable researchers to gain new insights into how human designers decompose complex design problems.


2016 ◽  
Vol 43 (1) ◽  
pp. 54-74 ◽  
Author(s):  
Baojun Ma ◽  
Hua Yuan ◽  
Ye Wu

Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis.


2021 ◽  
pp. 1-18
Author(s):  
Trang T.D. Nguyen ◽  
Loan T.T. Nguyen ◽  
Anh Nguyen ◽  
Unil Yun ◽  
Bay Vo

Spatial clustering is one of the main techniques for spatial data mining and spatial data analysis. However, existing spatial clustering methods primarily focus on points distributed in planar space with the Euclidean distance measurement. Recently, NS-DBSCAN has been developed to perform clustering of spatial point events in Network Space based on a well-known clustering algorithm, named Density-Based Spatial Clustering of Applications with Noise (DBSCAN). The NS-DBSCAN algorithm has efficiently solved the problem of clustering network constrained spatial points. When compared to the NC_DT (Network-Constraint Delaunay Triangulation) clustering algorithm, the NS-DBSCAN algorithm efficiently solves the problem of clustering network constrained spatial points by visualizing the intrinsic clustering structure of spatial data by constructing density ordering charts. However, the main drawback of this algorithm is when the data are processed, objects that are not specifically categorized into types of clusters cannot be removed, which is undeniably a waste of time, particularly when the dataset is large. In an attempt to have this algorithm work with great efficiency, we thus recommend removing edges that are longer than the threshold and eliminating low-density points from the density ordering table when forming clusters and also take other effective techniques into consideration. In this paper, we develop a theorem to determine the maximum length of an edge in a road segment. Based on this theorem, an algorithm is proposed to greatly improve the performance of the density-based clustering algorithm in network space (NS-DBSCAN). Experiments using our proposed algorithm carried out in collaboration with Ho Chi Minh City, Vietnam yield the same results but shows an advantage of it over NS-DBSCAN in execution time.


2021 ◽  
Vol 13 (9) ◽  
pp. 4648
Author(s):  
Rana Muhammad Adnan ◽  
Kulwinder Singh Parmar ◽  
Salim Heddam ◽  
Shamsuddin Shahid ◽  
Ozgur Kisi

The accurate estimation of suspended sediments (SSs) carries significance in determining the volume of dam storage, river carrying capacity, pollution susceptibility, soil erosion potential, aquatic ecological impacts, and the design and operation of hydraulic structures. The presented study proposes a new method for accurately estimating daily SSs using antecedent discharge and sediment information. The novel method is developed by hybridizing the multivariate adaptive regression spline (MARS) and the Kmeans clustering algorithm (MARS–KM). The proposed method’s efficacy is established by comparing its performance with the adaptive neuro-fuzzy system (ANFIS), MARS, and M5 tree (M5Tree) models in predicting SSs at two stations situated on the Yangtze River of China, according to the three assessment measurements, RMSE, MAE, and NSE. Two modeling scenarios are employed; data are divided into 50–50% for model training and testing in the first scenario, and the training and test data sets are swapped in the second scenario. In Guangyuan Station, the MARS–KM showed a performance improvement compared to ANFIS, MARS, and M5Tree methods in term of RMSE by 39%, 30%, and 18% in the first scenario and by 24%, 22%, and 8% in the second scenario, respectively, while the improvement in RMSE of ANFIS, MARS, and M5Tree was 34%, 26%, and 27% in the first scenario and 7%, 16%, and 6% in the second scenario, respectively, at Beibei Station. Additionally, the MARS–KM models provided much more satisfactory estimates using only discharge values as inputs.


2007 ◽  
Vol 16 (06) ◽  
pp. 919-934
Author(s):  
YONGGUO LIU ◽  
XIAORONG PU ◽  
YIDONG SHEN ◽  
ZHANG YI ◽  
XIAOFENG LIAO

In this article, a new genetic clustering algorithm called the Improved Hybrid Genetic Clustering Algorithm (IHGCA) is proposed to deal with the clustering problem under the criterion of minimum sum of squares clustering. In IHGCA, the improvement operation including five local iteration methods is developed to tune the individual and accelerate the convergence speed of the clustering algorithm, and the partition-absorption mutation operation is designed to reassign objects among different clusters. By experimental simulations, its superiority over some known genetic clustering methods is demonstrated.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1234
Author(s):  
Lei Zha ◽  
Yu Yang ◽  
Zicheng Lai ◽  
Ziwei Zhang ◽  
Juan Wen

In recent years, neural networks for single image super-resolution (SISR) have applied more profound and deeper network structures to extract extra image details, which brings difficulties in model training. To deal with deep model training problems, researchers utilize dense skip connections to promote the model’s feature representation ability by reusing deep features of different receptive fields. Benefiting from the dense connection block, SRDensenet has achieved excellent performance in SISR. Despite the fact that the dense connected structure can provide rich information, it will also introduce redundant and useless information. To tackle this problem, in this paper, we propose a Lightweight Dense Connected Approach with Attention for Single Image Super-Resolution (LDCASR), which employs the attention mechanism to extract useful information in channel dimension. Particularly, we propose the recursive dense group (RDG), consisting of Dense Attention Blocks (DABs), which can obtain more significant representations by extracting deep features with the aid of both dense connections and the attention module, making our whole network attach importance to learning more advanced feature information. Additionally, we introduce the group convolution in DABs, which can reduce the number of parameters to 0.6 M. Extensive experiments on benchmark datasets demonstrate the superiority of our proposed method over five chosen SISR methods.


Genetics ◽  
2001 ◽  
Vol 159 (2) ◽  
pp. 699-713
Author(s):  
Noah A Rosenberg ◽  
Terry Burke ◽  
Kari Elo ◽  
Marcus W Feldman ◽  
Paul J Freidlin ◽  
...  

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.


Sign in / Sign up

Export Citation Format

Share Document