Detecting the Number of Clusters in n-Way Probabilistic Clustering

Zhaoshui He; Andrzej Cichocki;  Shengli Xie;  Kyuwan Choi

doi:10.1109/tpami.2010.15

Cluster Tendency Assessment in Neuronal Spike Data

10.1101/285064 ◽

2018 ◽

Cited By ~ 2

Author(s):

Sara Mahallati ◽

James C. Bezdek ◽

Milos R. Popovic ◽

Taufik A. Valiante

Keyword(s):

Clustering Algorithm ◽

A Priori ◽

Clustering Algorithms ◽

Cluster Structure ◽

Extracellular Recording ◽

Ground Truth ◽

Visual Assessment ◽

Spike Sorting ◽

Number Of Clusters ◽

Probabilistic Clustering

AbstractSorting spikes from extracellular recording into clusters associated with distinct single units (putative neurons) is a fundamental step in analyzing neuronal populations. Such spike sorting is intrinsically unsupervised, as the number of neurons are not known a priori. Therefor, any spike sorting is an unsupervised learning problem that requires either of the two approaches: specification of a fixed value c for the number of clusters to seek, or generation of candidate partitions for several possible values of c, followed by selection of a best candidate based on various post-clustering validation criteria. In this paper, we investigate the first approach and evaluate the utility of several methods for providing lower dimensional visualization of the cluster structure and on subsequent spike clustering. We also introduce a visualization technique called improved visual assessment of cluster tendency (iVAT) to estimate possible cluster structures in data without the need for dimensionality reduction. Experimental results are conducted on two datasets with ground truth labels. In data with a relatively small number of clusters, iVAT is beneficial in estimating the number of clusters to inform the initialization of clustering algorithms. With larger numbers of clusters, iVAT gives a useful estimate of the coarse cluster structure but sometimes fails to indicate the presumptive number of clusters. We show that noise associated with recording extracellular neuronal potentials can disrupt computational clustering schemes, highlighting the benefit of probabilistic clustering models. Our results show that t-Distributed Stochastic Neighbor Embedding (t-SNE) provides representations of the data that yield more accurate visualization of potential cluster structure to inform the clustering stage. Moreover, The clusters obtained using t-SNE features were more reliable than the clusters obtained using the other methods, which indicates that t-SNE can potentially be used for both visualization and to extract features to be used by any clustering algorithm.

Download Full-text

RESAMPLING FOR FUZZY CLUSTERING

International Journal of Uncertainty Fuzziness and Knowledge-Based Systems ◽

10.1142/s0218488507004893 ◽

2007 ◽

Vol 15 (05) ◽

pp. 595-614 ◽

Cited By ~ 8

Author(s):

CHRISTIAN BORGELT

Keyword(s):

Fuzzy Clustering ◽

Data Set ◽

Number Of Clusters ◽

Resampling Methods ◽

Probabilistic Clustering ◽

The Core ◽

Core Idea ◽

The Right ◽

The Given ◽

Comparison Measures

Resampling methods are among the best approaches to determine the number of clusters in prototype-based clustering. The core idea is that with the right choice for the number of clusters basically the same cluster structures should be obtained from subsamples of the given data set, while a wrong choice should produce considerably varying cluster structures. In this paper I give an overview how such resampling approaches can be transferred to fuzzy and probabilistic clustering. I study several cluster comparison measures, which can be parameterized with t-norms, and report experiments that provide some guidance which of them may be the best choice.

Download Full-text

Cluster Analysis of Antigenic Profiles of Tumors: Selection of Number of Clusters Using Akaike’s Information Criterion

Methods of Information in Medicine ◽

10.1055/s-0038-1634783 ◽

1990 ◽

Vol 29 (03) ◽

pp. 200-204 ◽

Cited By ~ 7

Author(s):

J. A. Koziol

Keyword(s):

Cluster Analysis ◽

Basic Problem ◽

Information Criterion ◽

Akaike's Information Criterion ◽

Cell Surface Antigens ◽

Number Of Clusters ◽

Akaike’S Information Criterion ◽

Multinomial Data ◽

Tumor Types ◽

Selection Of

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.

Download Full-text

HEXACO personality factors and their associations with Facebook use and Facebook network characteristics

10.31234/osf.io/3zvhq ◽

2018 ◽

Author(s):

Riana Brown ◽

Sam G. B. Roberts ◽

Thomas V. Pollet

Keyword(s):

Social Networks ◽

Network Size ◽

Openness To Experience ◽

Personality Factors ◽

Number Of Clusters ◽

Network Characteristics ◽

Online Networks ◽

Facebook Use ◽

Objectively Measured ◽

Size And Structure

Personality factors affect the properties of ‘offline’ social networks, but how they are associated with the structural properties of online networks is still unclear. We investigated how the six HEXACO personality factors (Honesty-Humility, Emotionality, Extraversion, Agreeableness, Conscientiousness and Openness to Experience) relate to Facebook use and three objectively measured Facebook network characteristics - network size, density, and number of clusters. Participants (n = 107, mean age = 20.6, 66% female) extracted their Facebook networks using the GetNet app, completed the 60-item HEXACO questionnaire and the Facebook Usage Questionnaire. Users high in Openness to Experience spent less time on Facebook. Extraversion was positively associated with network size and the number of network clusters (but not after controlling for size). These findings suggest that personality factors are associated with Facebook use and the size and structure of Facebook networks, and that personality is an important influence on both online and offline sociality.

Download Full-text

Method for determining optimal number of clusters in K-means clustering algorithm

Journal of Computer Applications ◽

10.3724/sp.j.1087.2010.01995 ◽

2010 ◽

Vol 30 (8) ◽

pp. 1995-1998 ◽

Cited By ~ 18

Author(s):

Shi-bing ZHOU ◽

Zhen-yuan XU ◽

Xu-qing TANG

Keyword(s):

Clustering Algorithm ◽

Optimal Number ◽

Number Of Clusters ◽

Optimal Number Of Clusters

Download Full-text

Clustering Count-based RNA Methylation Data Using a Nonparametric Generative Model

Current Bioinformatics ◽

10.2174/1574893613666180601080008 ◽

2018 ◽

Vol 14 (1) ◽

pp. 11-23 ◽

Cited By ~ 3

Author(s):

Lin Zhang ◽

Yanling He ◽

Huaizhi Wang ◽

Hui Liu ◽

Yufei Huang ◽

...

Keyword(s):

Clustering Analysis ◽

Methylation Level ◽

Optimal Number ◽

Generative Model ◽

Methylation Data ◽

Sequencing Data ◽

Number Of Clusters ◽

Rna Methylation ◽

Clustering Effect ◽

Optimal Number Of Clusters

Background: RNA methylome has been discovered as an important layer of gene regulation and can be profiled directly with count-based measurements from high-throughput sequencing data. Although the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data has unique features, such as low reads coverage, which calls for novel clustering approaches. Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach clustering analysis of count-based RNA methylation sequencing data. Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the clustering effect in methylation level with the original count-based measurements rather than an estimated continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically determine an optimal number of clusters so as to avoid the common model selection problem in clustering analysis. Results: When tested on the simulated system, the method demonstrated improved clustering performance over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex. Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters adaptively from the data analyzed. Availability: The source code and documents of DPBBM R package are freely available through the Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.

Download Full-text

Population density, activity centres, and pandemic: Visualizing clusters of COVID-19 cases in Hong Kong

Environment and Planning A Economy and Space ◽

10.1177/0308518x211012700 ◽

2021 ◽

pp. 0308518X2110127

Author(s):

Jiangping Zhou ◽

Sam KS Ho ◽

Shuyu Lei ◽

Valarie CK Pang

Keyword(s):

Hong Kong ◽

Population Density ◽

Spatial Patterns ◽

Number Of Clusters ◽

The Third ◽

General Number

The impacts of coronavirus disease 2019 (COVID-19) on society and economy are wide-ranging, long-lasting, and global. The experience of multiple countries or regions in fighting the pandemic indicates that there could be multiple COVID-19 surges, where a growing number of cases can be observed in the more recent surge(s). Were COVID-19 cases and clusters of cases (across surges) randomly distributed in spaces? Did population density and activity centres influence clusters of cases and associated venues? Based on information on the associated venues of the four surges of COVID-19 cases between January 2020 and February 2021 as well as population density, visuals were made to distinguish the relationships between population density, activity centres, and clusters of cases in Hong Kong. Different spatial patterns were observed across the four surges: fewer cases were observed in the first surge with a more evenly distributed pattern of clusters; the second surge as compared to the first surge saw a wider distribution and an increase in the number/layer of clusters; compared to the second surge, the third surge suffered from many more cases but saw a decrease in the general number of clusters; and compared to the previous three surges, the fourth surge had the largest number of cases, yet even fewer clusters were observed, where several clusters are again concentrated in specific areas similar to the previous surge. Across the four surges, a few locales could see recurrent clusters of cases and a few communities were without cases.

Download Full-text

Functional Clustering Based on Weighted Partitioning around Medoid Algorithm with Estimation of Number of Clusters

2021 IEEE 6th International Conference on Cloud Computing and Big Data Analytics (ICCCBDA) ◽

10.1109/icccbda51879.2021.9442491 ◽

2021 ◽

Author(s):

Jianan Zhang

Keyword(s):

Number Of Clusters ◽

Functional Clustering

Download Full-text

Optimal Coordination of Over-Current Relays in Microgrids Using Unsupervised Learning Techniques

Applied Sciences ◽

10.3390/app11031241 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1241

Author(s):

Sergio D. Saldarriaga-Zuluaga ◽

Jesús M. López-Lezama ◽

Nicolás Muñoz-Galeano

Keyword(s):

Unsupervised Learning ◽

Distributed Generation ◽

Network Topology ◽

International Electrotechnical Commission ◽

Number Of Clusters ◽

Learning Techniques ◽

Topology Changes ◽

Network Topologies ◽

Optimal Coordination ◽

Operational Modes

Microgrids constitute complex systems that integrate distributed generation (DG) and feature different operational modes. The optimal coordination of directional over-current relays (DOCRs) in microgrids is a challenging task, especially if topology changes are taken into account. This paper proposes an adaptive protection approach that takes advantage of multiple setting groups that are available in commercial DOCRs to account for network topology changes in microgrids. Because the number of possible topologies is greater than the available setting groups, unsupervised learning techniques are explored to classify network topologies into a number of clusters that is equal to the number of setting groups. Subsequently, optimal settings are calculated for every topology cluster. Every setting is saved in the DOCRs as a different setting group that would be activated when a corresponding topology takes place. Several tests are performed on a benchmark IEC (International Electrotechnical Commission) microgrid, evidencing the applicability of the proposed approach.

Download Full-text

A novel bidirectional clustering algorithm based on local density

Scientific Reports ◽

10.1038/s41598-021-93244-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Baicheng Lyu ◽

Wenhua Wu ◽

Zhiqiang Hu

Keyword(s):

Clustering Algorithm ◽

Local Density ◽

Clustering Algorithms ◽

Cluster Number ◽

Denoising Method ◽

Number Of Clusters ◽

Data Points ◽

Cutoff Distance ◽

Large Clusters ◽

Small Clusters

AbstractWith the widely application of cluster analysis, the number of clusters is gradually increasing, as is the difficulty in selecting the judgment indicators of cluster numbers. Also, small clusters are crucial to discovering the extreme characteristics of data samples, but current clustering algorithms focus mainly on analyzing large clusters. In this paper, a bidirectional clustering algorithm based on local density (BCALoD) is proposed. BCALoD establishes the connection between data points based on local density, can automatically determine the number of clusters, is more sensitive to small clusters, and can reduce the adjusted parameters to a minimum. On the basis of the robustness of cluster number to noise, a denoising method suitable for BCALoD is proposed. Different cutoff distance and cutoff density are assigned to each data cluster, which results in improved clustering performance. Clustering ability of BCALoD is verified by randomly generated datasets and city light satellite images.

Download Full-text