Incremental Hierarchical Clustering for Data Insertion and Its Evaluation

Kakeru Narita; Teruhisa Hochin; Yoshihiro Hayashi; Hiroki Nomiya

doi:10.4018/ijsi.2020040101

Adaptive K-Means Algorithm with Dynamically Changing Cluster Centers and K-Value

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.532-533.1373 ◽

2012 ◽

Vol 532-533 ◽

pp. 1373-1377 ◽

Cited By ~ 1

Author(s):

Ai Ping Deng ◽

Ben Xiao ◽

Hui Yong Yuan

Keyword(s):

Nearest Neighbor ◽

Experimental Results ◽

Data Set ◽

Number Of Clusters ◽

K Value ◽

Testing Data ◽

Different Types ◽

Data Points ◽

Shared Nearest Neighbor

In allusion to the disadvantage of having to obtain the number of clusters in advance and the sensitivity to selecting initial clustering centers in the K-means algorithm, an improved K-means algorithm is proposed, that the cluster centers and the number of clusters are dynamically changing. The new algorithm determines the cluster centers by calculating the density of data points and shared nearest neighbor similarity, and controls the clustering categories by using the average shared nearest neighbor self-similarity.The experimental results of IRIS testing data set show that the algorithm can select the cluster cennters and can distinguish between different types of cluster efficiently.

Download Full-text

The identification of stages in diachronic data: variability-based neighbour clustering

Corpora ◽

10.3366/e1749503208000075 ◽

2008 ◽

Vol 3 (1) ◽

pp. 59-81 ◽

Cited By ~ 42

Author(s):

Stefan Th. Gries ◽

Martin Hilpert

Keyword(s):

Case Studies ◽

Hierarchical Clustering ◽

The Other ◽

Data Driven ◽

Clustering Method ◽

Chronological Order ◽

Bottom Up ◽

Data Variability ◽

Data Points ◽

Corpus Data

In this paper, we introduce a data-driven bottom-up clustering method for the identification of stages in diachronic corpus data that differ from each other quantitatively. Much like regular approaches to hierarchical clustering, it is based on identifying and merging the most cohesive groups of data points, but, unlike regular approaches to clustering, it allows for the merging of temporally adjacent data, thus, in effect, preserving the chronological order. We exemplify the method with two case studies, one on verbal complementation of shall, the other on the development of the perfect in English.

Download Full-text

An Improved Clustering Method Based on Data Field

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.457-458.919 ◽

2013 ◽

Vol 457-458 ◽

pp. 919-925

Author(s):

Yu Hua Liu ◽

Cui Xu ◽

Ke Xu ◽

Jian Zhi Jin

Keyword(s):

Globular Clusters ◽

Local Solution ◽

Experimental Results ◽

Clustering Method ◽

Number Of Clusters ◽

Initial Cluster ◽

Data Field ◽

Noise Data ◽

Small Clusters ◽

Irregular Cluster

By analyzing the problem of k-means, we find the traditional k-means algorithm suffers from some shortcomings, such as requiring the user to give out the number of clusters k in advance, being sensitive to the initial cluster centers, being sensitive to the noise and isolated data, only being applied to the type found in globular clusters, and being easily trapped into a local solution et cetera. This improved algorithm uses the potential of data to find the center data and eliminate the noise data. It decomposes big or extended cluster into several small clusters, then merges adjacent small clusters into a big cluster using the information provided by the Safety Area. Experimental results demonstrate that the improved k-means algorithm can determine the number of clusters, distinguish irregular cluster to a certain extent, decrease the dependence on the initial cluster centers, eliminate the effects of the noise data and get a better clustering accuracy.

Download Full-text

Robust Multi-view Learning via Half-quadratic Minimization

Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2018/455 ◽

2018 ◽

Cited By ~ 1

Author(s):

Yonghua Zhu ◽

Xiaofeng Zhu ◽

Wei Zheng

Keyword(s):

Error Estimation ◽

State Of The Art ◽

Data Distribution ◽

Experimental Results ◽

Clustering Methods ◽

Clustering Method ◽

Number Of Clusters ◽

Single View ◽

Quadratic Minimization ◽

Quadratic Theory

Although multi-view clustering is capable to usemore information than single view clustering, existing multi-view clustering methods still have issues to be addressed, such as initialization sensitivity, the specification of the number of clusters,and the influence of outliers. In this paper, we propose a robust multi-view clustering method to address these issues. Specifically, we first propose amulti-view based sum-of-square error estimation tomake the initialization easy and simple as well asuse a sum-of-norm regularization to automaticallylearn the number of clusters according to data distribution. We further employ robust estimators constructed by the half-quadratic theory to avoid theinfluence of outliers for conducting robust estimations of both sum-of-square error and the numberof clusters. Experimental results on both syntheticand real datasets demonstrate that our method outperforms the state-of-the-art methods.

Download Full-text

Improved Parameterless K-Means

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch010 ◽

2013 ◽

pp. 156-168

Author(s):

Wan Maseri Binti Wan Mohd ◽

A.H. Beg ◽

Tutut Herawan ◽

A. Noraziah ◽

K. F. Rabbi

Keyword(s):

Unsupervised Learning ◽

Globular Clusters ◽

Clustering Algorithm ◽

Experimental Results ◽

Maximum Distance ◽

Initial Number ◽

Number Of Clusters ◽

New Approach ◽

Data Points ◽

Run Time

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.

Download Full-text

Improved Parameterless K-Means

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011070101 ◽

2011 ◽

Vol 1 (3) ◽

pp. 1-14 ◽

Cited By ~ 5

Author(s):

Wan Maseri Binti Wan Mohd ◽

A.H. Beg ◽

Tutut Herawan ◽

A. Noraziah ◽

K. F. Rabbi

Keyword(s):

Unsupervised Learning ◽

Globular Clusters ◽

Clustering Algorithm ◽

Experimental Results ◽

Maximum Distance ◽

Initial Number ◽

Number Of Clusters ◽

New Approach ◽

Main Drawback ◽

Data Points

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat (non-hierarchical) clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters (k) and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.

Download Full-text

Discovering New Intents via Constrained Deep Adaptive Clustering with Cluster Refinement

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6353 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8360-8367

Author(s):

Ting-En Lin ◽

Hua Xu ◽

Hanlei Zhang

Keyword(s):

Prior Knowledge ◽

Experimental Results ◽

Feature Engineering ◽

Clustering Method ◽

Dialogue System ◽

High Confidence ◽

Number Of Clusters ◽

Adaptive Clustering ◽

Benchmark Datasets ◽

Definition Of

Identifying new user intents is an essential task in the dialogue system. However, it is hard to get satisfying clustering results since the definition of intents is strongly guided by prior knowledge. Existing methods incorporate prior knowledge by intensive feature engineering, which not only leads to overfitting but also makes it sensitive to the number of clusters. In this paper, we propose constrained deep adaptive clustering with cluster refinement (CDAC+), an end-to-end clustering method that can naturally incorporate pairwise constraints as prior knowledge to guide the clustering process. Moreover, we refine the clusters by forcing the model to learn from the high confidence assignments. After eliminating low confidence assignments, our approach is surprisingly insensitive to the number of clusters. Experimental results on the three benchmark datasets show that our method can yield significant improvements over strong baselines. 1

Download Full-text

A hierarchical clustering method for random intervals based on a similarity measure

Computational Statistics ◽

10.1007/s00180-021-01121-3 ◽

2021 ◽

Author(s):

Ana Belén Ramos-Guajardo

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Real Life ◽

Stopping Criterion ◽

Clustering Method ◽

Bootstrap Test ◽

Empirical Performance ◽

Random Intervals ◽

Expected Values

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.

Download Full-text

Lightweight Blockchain Processing. Case Study: Scanned Document Tracking on Tezos Blockchain

Applied Sciences ◽

10.3390/app11157169 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7169

Author(s):

Mohamed Allouche ◽

Tarek Frikha ◽

Mihai Mitrea ◽

Gérard Memmi ◽

Faten Chaabane

Keyword(s):

Load Balancing ◽

Relative Error ◽

Execution Time ◽

General Purpose ◽

Experimental Results ◽

Raspberry Pi ◽

Embedded Platform ◽

Memory Resources ◽

Processing Solution

To bridge the current gap between the Blockchain expectancies and their intensive computation constraints, the present paper advances a lightweight processing solution, based on a load-balancing architecture, compatible with the lightweight/embedding processing paradigms. In this way, the execution of complex operations is securely delegated to an off-chain general-purpose computing machine while the intimate Blockchain operations are kept on-chain. The illustrations correspond to an on-chain Tezos configuration and to a multiprocessor ARM embedded platform (integrated into a Raspberry Pi). The performances are assessed in terms of security, execution time, and CPU consumption when achieving a visual document fingerprint task. It is thus demonstrated that the advanced solution makes it possible for a computing intensive application to be deployed under severely constrained computation and memory resources, as set by a Raspberry Pi 3. The experimental results show that up to nine Tezos nodes can be deployed on a single Raspberry Pi 3 and that the limitation is not derived from the memory but from the computation resources. The execution time with a limited number of fingerprints is 40% higher than using a classical PC solution (value computed with 95% relative error lower than 5%).

Download Full-text

Conical SNA using fuzzy k-medoids based on user experience

International Journal of Electrical Engineering Education ◽

10.1177/0020720920988490 ◽

2021 ◽

pp. 002072092098849

Author(s):

Poonam Rani ◽

MPS Bhatia ◽

Devendra K Tayal

Keyword(s):

Social Networks ◽

Social Network ◽

User Experience ◽

Three Dimensional ◽

Experimental Results ◽

Dynamic Parameters ◽

Clustering Method ◽

Cone Model ◽

Intelligent Approach ◽

Conical Model

The paper presents an intelligent approach for the comparison of social networks through a cone model by using the fuzzy k-medoids clustering method. It makes use of a geometrical three-dimensional conical model, which astutely represents the user experience views. It uses both the static as well as the dynamic parameters of social networks. In this, we propose an algorithm that investigates which social network is more fruitful. For the experimental results, the proposed work is employed on the data collected from students from different universities through the Google forms, where students are required to rate their experience of using different social networks on different scales.

Download Full-text