scholarly journals Clustering for Probability Density Functions by New k-Medoids Method

2018 ◽  
Vol 2018 ◽  
pp. 1-7 ◽  
Author(s):  
D. Ho-Kieu ◽  
T. Vo-Van ◽  
T. Nguyen-Trang

This paper proposes a novel and efficient clustering algorithm for probability density functions based on k-medoids. Further, a scheme used for selecting the powerful initial medoids is suggested, which speeds up the computational time significantly. Also, a general proof for convergence of the proposed algorithm is presented. The effectiveness and feasibility of the proposed algorithm are verified and compared with various existing algorithms through both artificial and real datasets in terms of adjusted Rand index, computational time, and iteration number. The numerical results reveal an outstanding performance of the proposed algorithm as well as its potential applications in real life.

2018 ◽  
Vol 2 (3) ◽  
pp. 174
Author(s):  
Diem Ngoc Tran ◽  
Tom Vinant ◽  
Théo Marc Colombani ◽  
Diem Ho-Kieu

This paper aims to present a code for implementation of non-hierarchical algorithm to cluster probability density functions in one dimension for the first time in R environment. The structure of code consists of 2 primary steps: executing the main clustering algorithm and evaluating the clustering quality. The code is validated on one simulated data set and two applications. The numerical results obtained are highly compatible with that on MATLAB software regarding computational time. Notably, the code mainly serves for educational purpose and desires to extend the availability of algorithm in several environments so as having multiple choices for whom interested in clustering.  This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


2013 ◽  
Vol 2013 ◽  
pp. 1-17
Author(s):  
M. S. Woolfson ◽  
C. Bigan ◽  
J. A. Crowe ◽  
B. R. Hayes-Gill

A blind source separation method is described to extract sources from data mixtures where the underlying sources are sparse and correlated. The approach used is to detect and analyze segments of time where one source exists on its own. The method does not assume independence of sources and probability density functions are not assumed for any of the sources. A comparison is made between the proposed method and the Fast-ICA and Clusterwise PCA methods. It is shown that the proposed method works best for cases where the underlying sources are strongly correlated because Fast-ICA assumes zero correlation between sources and Clusterwise PCA can be sensitive to overlap between sources. However, for cases of sources that are sparse and weakly correlated with each other, there is a tendency for Fast-ICA and Clusterwise PCA to have better performances than the proposed method, the reason being that these methods appear to be more robust to changes in input parameters to the algorithms. In addition, because of the deflationary nature of the proposed method, there is a tendency for estimates to be more affected by noise than Fast-ICA when the number of sources increases. The paper concludes with a discussion concerning potential applications for the proposed method.


2019 ◽  
Vol 2019 ◽  
pp. 1-16
Author(s):  
D. Pham-Toan ◽  
T. Vo-Van ◽  
A. T. Pham-Chau ◽  
T. Nguyen-Trang ◽  
D. Ho-Kieu

This paper proposes an evolutionary computing based automatic partitioned clustering of probability density function, the so-called binary adaptive elitist differential evolution for clustering of probability density functions (baeDE-CDFs). Herein, the k-medoids based representative probability density functions (PDFs) are preferred to the k-means one for their capability of avoiding outlier effectively. Moreover, addressing clustering problem in favor of an evolutionary optimization one permits determining number of clusters “on the run”. Notably, the application of adaptive elitist differential evolution (aeDE) algorithm with binary chromosome representation not only decreases the computational burden remarkably, but also increases the quality of solution significantly. Multiple numerical examples are designed and examined to verify the proposed algorithm’s performance, and the numerical results are evaluated using numerous criteria to give a comprehensive conclusion. After some comparisons with other algorithms in the literature, it is worth noticing that the proposed algorithm reveals an outstanding performance in both quality of solution and computational time in a statistically significant way.


2020 ◽  
Vol 15 (3) ◽  
pp. 249-261
Author(s):  
Dinh Phamtoan ◽  
Tai Vovan

This study introduces a measure called coefficient of within-cluster proximity (CWP) to evaluate the similarity of probability density functions (DFs) within clusters. After surveying the under and upper, and the computational problems of CWP, a fuzzy clustering algorithm for DFs is proposed. This algorithm can determine the suitable number of clusters and find the probability for each DF to belong to specific cluster. The convergence of the algorithm is considered in theory and illustrated by the numerical examples. The algorithm is applied to image recognition. The results show strong advantages of it in comparison to other algorithms. They also indicate the potential of the proposed approach in application to the data of different types.


2021 ◽  
Vol 13 (12) ◽  
pp. 2307
Author(s):  
J. Javier Gorgoso-Varela ◽  
Rafael Alonso Ponce ◽  
Francisco Rodríguez-Puerta

The diameter distributions of trees in 50 temporary sample plots (TSPs) established in Pinus halepensis Mill. stands were recovered from LiDAR metrics by using six probability density functions (PDFs): the Weibull (2P and 3P), Johnson’s SB, beta, generalized beta and gamma-2P functions. The parameters were recovered from the first and the second moments of the distributions (mean and variance, respectively) by using parameter recovery models (PRM). Linear models were used to predict both moments from LiDAR data. In recovering the functions, the location parameters of the distributions were predetermined as the minimum diameter inventoried, and scale parameters were established as the maximum diameters predicted from LiDAR metrics. The Kolmogorov–Smirnov (KS) statistic (Dn), number of acceptances by the KS test, the Cramér von Misses (W2) statistic, bias and mean square error (MSE) were used to evaluate the goodness of fits. The fits for the six recovered functions were compared with the fits to all measured data from 58 TSPs (LiDAR metrics could only be extracted from 50 of the plots). In the fitting phase, the location parameters were fixed at a suitable value determined according to the forestry literature (0.75·dmin). The linear models used to recover the two moments of the distributions and the maximum diameters determined from LiDAR data were accurate, with R2 values of 0.750, 0.724 and 0.873 for dg, dmed and dmax. Reasonable results were obtained with all six recovered functions. The goodness-of-fit statistics indicated that the beta function was the most accurate, followed by the generalized beta function. The Weibull-3P function provided the poorest fits and the Weibull-2P and Johnson’s SB also yielded poor fits to the data.


Sign in / Sign up

Export Citation Format

Share Document