Detecting the Number of Clusters in n-Way Probabilistic Clustering

2010 ◽  
Vol 32 (11) ◽  
pp. 2006-2021 ◽  
Author(s):  
Zhaoshui He ◽  
Andrzej Cichocki ◽  
Shengli Xie ◽  
Kyuwan Choi
2018 ◽  
Author(s):  
Sara Mahallati ◽  
James C. Bezdek ◽  
Milos R. Popovic ◽  
Taufik A. Valiante

AbstractSorting spikes from extracellular recording into clusters associated with distinct single units (putative neurons) is a fundamental step in analyzing neuronal populations. Such spike sorting is intrinsically unsupervised, as the number of neurons are not known a priori. Therefor, any spike sorting is an unsupervised learning problem that requires either of the two approaches: specification of a fixed value c for the number of clusters to seek, or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. In this paper, we investigate the first approach and evaluate the utility of several methods for providing lower dimensional visualization of the cluster structure and on subsequent spike clustering. We also introduce a visualization technique called improved visual assessment of cluster tendency (iVAT) to estimate possible cluster structures in data without the need for dimensionality reduction. Experimental results are conducted on two datasets with ground truth labels. In data with a relatively small number of clusters, iVAT is beneficial in estimating the number of clusters to inform the initialization of clustering algorithms. With larger numbers of clusters, iVAT gives a useful estimate of the coarse cluster structure but sometimes fails to indicate the presumptive number of clusters. We show that noise associated with recording extracellular neuronal potentials can disrupt computational clustering schemes, highlighting the benefit of probabilistic clustering models. Our results show that t-Distributed Stochastic Neighbor Embedding (t-SNE) provides representations of the data that yield more accurate visualization of potential cluster structure to inform the clustering stage. Moreover, The clusters obtained using t-SNE features were more reliable than the clusters obtained using the other methods, which indicates that t-SNE can potentially be used for both visualization and to extract features to be used by any clustering algorithm.


Author(s):  
CHRISTIAN BORGELT

Resampling methods are among the best approaches to determine the number of clusters in prototype-based clustering. The core idea is that with the right choice for the number of clusters basically the same cluster structures should be obtained from subsamples of the given data set, while a wrong choice should produce considerably varying cluster structures. In this paper I give an overview how such resampling approaches can be transferred to fuzzy and probabilistic clustering. I study several cluster comparison measures, which can be parameterized with t-norms, and report experiments that provide some guidance which of them may be the best choice.


1990 ◽  
Vol 29 (03) ◽  
pp. 200-204 ◽  
Author(s):  
J. A. Koziol

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.


2018 ◽  
Author(s):  
Riana Brown ◽  
Sam G. B. Roberts ◽  
Thomas V. Pollet

Personality factors affect the properties of ‘offline’ social networks, but how they are associated with the structural properties of online networks is still unclear. We investigated how the six HEXACO personality factors (Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness and Openness to Experience) relate to Facebook use and three objectively measured Facebook network characteristics - network size, density, and number of clusters. Participants (n = 107, mean age = 20.6, 66% female) extracted their Facebook networks using the GetNet app, completed the 60-item HEXACO questionnaire and the Facebook Usage Questionnaire. Users high in Openness to Experience spent less time on Facebook. Extraversion was positively associated with network size and the number of network clusters (but not after controlling for size). These findings suggest that personality factors are associated with Facebook use and the size and structure of Facebook networks, and that personality is an important influence on both online and offline sociality.


2018 ◽  
Vol 14 (1) ◽  
pp. 11-23 ◽  
Author(s):  
Lin Zhang ◽  
Yanling He ◽  
Huaizhi Wang ◽  
Hui Liu ◽  
Yufei Huang ◽  
...  

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. <P><P> Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. <P><P> Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. <P><P> Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. <P><P> Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. <P><P> Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.


2021 ◽  
pp. 0308518X2110127
Author(s):  
Jiangping Zhou ◽  
Sam KS Ho ◽  
Shuyu Lei ◽  
Valarie CK Pang

The impacts of coronavirus disease 2019 (COVID-19) on society and economy are wide-ranging, long-lasting, and global. The experience of multiple countries or regions in fighting the pandemic indicates that there could be multiple COVID-19 surges, where a growing number of cases can be observed in the more recent surge(s). Were COVID-19 cases and clusters of cases (across surges) randomly distributed in spaces? Did population density and activity centres influence clusters of cases and associated venues? Based on information on the associated venues of the four surges of COVID-19 cases between January 2020 and February 2021 as well as population density, visuals were made to distinguish the relationships between population density, activity centres, and clusters of cases in Hong Kong. Different spatial patterns were observed across the four surges: fewer cases were observed in the first surge with a more evenly distributed pattern of clusters; the second surge as compared to the first surge saw a wider distribution and an increase in the number/layer of clusters; compared to the second surge, the third surge suffered from many more cases but saw a decrease in the general number of clusters; and compared to the previous three surges, the fourth surge had the largest number of cases, yet even fewer clusters were observed, where several clusters are again concentrated in specific areas similar to the previous surge. Across the four surges, a few locales could see recurrent clusters of cases and a few communities were without cases.


2021 ◽  
Vol 11 (3) ◽  
pp. 1241
Author(s):  
Sergio D. Saldarriaga-Zuluaga ◽  
Jesús M. López-Lezama ◽  
Nicolás Muñoz-Galeano

Microgrids constitute complex systems that integrate distributed generation (DG) and feature different operational modes. The optimal coordination of directional over-current relays (DOCRs) in microgrids is a challenging task, especially if topology changes are taken into account. This paper proposes an adaptive protection approach that takes advantage of multiple setting groups that are available in commercial DOCRs to account for network topology changes in microgrids. Because the number of possible topologies is greater than the available setting groups, unsupervised learning techniques are explored to classify network topologies into a number of clusters that is equal to the number of setting groups. Subsequently, optimal settings are calculated for every topology cluster. Every setting is saved in the DOCRs as a different setting group that would be activated when a corresponding topology takes place. Several tests are performed on a benchmark IEC (International Electrotechnical Commission) microgrid, evidencing the applicability of the proposed approach.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Baicheng Lyu ◽  
Wenhua Wu ◽  
Zhiqiang Hu

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.


Sign in / Sign up

Export Citation Format

Share Document