Evaluation of the Gower coefficient modifications in hierarchical clustering

Zdeněk Šulc; Martin Matějka; Jiří Procházka; Hana Řezanková

doi:10.51936/eqvy9516

Evaluation of the Gower coefficient modifications in hierarchical clustering

Advances in Methodology and Statistics ◽

10.51936/eqvy9516 ◽

2017 ◽

Vol 14 (1) ◽

Author(s):

Zdeněk Šulc ◽

Martin Matějka ◽

Jiří Procházka ◽

Hana Řezanková

Keyword(s):

Hierarchical Clustering ◽

Mixed Type ◽

Similarity Measures ◽

Rand Index ◽

Clustering Methods ◽

Single Linkage ◽

Linkage Methods ◽

Hierarchical Clustering Methods ◽

Nominal Variables

This paper thoroughly examines three recently introduced modifications of the Gower coefficient, which were determined for data with mixed-type variables in hierarchical clustering. On the contrary to the original Gower coefficient, which only recognizes if two categories match or not in the case of nominal variables, the examined modifications offer three different approaches to measuring the similarity between categories. The examined dissimilarity measures are compared and evaluated regarding the quality of their clusters measured by three internal indices (Dunn, silhouette, McClain) and regarding their classification abilities measured by the Rand index. The comparison is performed on 810 generated datasets. In the analysis, the performance of the similarity measures is evaluated by different data characteristics (the number of variables, the number of categories, the distance of clusters, etc.) and by different hierarchical clustering methods (average, complete, McQuitty and single linkage methods). As a result, two modifications are recommended for the use in practice.

Download Full-text

Teleconsultation demand classification and service analysis

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-021-01610-x ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Wenjia Chen ◽

Jinlin Li

Keyword(s):

Hierarchical Clustering ◽

Cosine Similarity ◽

Graph Representation ◽

Clustering Methods ◽

Clustering Method ◽

Network Graph ◽

Before And After ◽

Hierarchical Clustering Methods ◽

Demand Classification

Abstract Background To enhance teleconsultation management, demands can be classified into different patterns, and the service of each pattern demand can be improved. Methods For the effective teleconsultation classification, a novel ensemble hierarchical clustering method is proposed in this study. In the proposed method, individual clustering results are first obtained by different hierarchical clustering methods, and then ensembled by one-hot encoding, the calculation and division of cosine similarity, and network graph representation. In the built network graph about the high cosine similarity, the connected demand series can be categorized into one pattern. For verification, 43 teleconsultation demand series are used as sample data, and the efficiency and quality of teleconsultation services are respectively analyzed before and after the demand classification. Results The teleconsultation demands are classified into three categories, erratic, lumpy, and slow. Under the fixed strategies, the service analysis after demand classification reveals the deficiencies of teleconsultation services, but analysis before demand classification can’t. Conclusion The proposed ensemble hierarchical clustering method can effectively category teleconsultation demands, and the effective demand categorization can enhance teleconsultation management.

Download Full-text

Comparison of Single Linkage, Complete Linkage, and Average Linkage Methods on Community Welfare Analysis in Cities and Regencies in East Java

Jurnal Matematika Statistika dan Komputasi ◽

10.20956/j.v18i1.14228 ◽

2021 ◽

Vol 18 (1) ◽

pp. 130-140

Author(s):

Yanuwar Reinaldi ◽

Nurissaidah Ulinnuha ◽

Moh. Hafiyusholeh

Keyword(s):

Hierarchical Clustering ◽

National Development ◽

Clustering Methods ◽

Single Linkage ◽

Complete Linkage ◽

Average Linkage ◽

Linkage Methods ◽

Silhouette Index ◽

Linkage Method ◽

Index Value

Community welfare is one of the important points for a region and is also the essence of national development. The welfare of the people in Indonesia is fairly unequal, especially in East Java. To be able to map an area to the welfare of its people in East Java, one way that can be used is to use clustering. The hierarchical clustering method is one of the clustering methods for grouping data. In hierarchical clustering, single linkage, complete linkage, and average linkage methods are suitable methods for grouping data, which will compare the best method to use. The results of the calculation show that the average linkage method with three clusters is the best calculation with a silhouette index value of 0.6054, with the 1st cluster there are 23 regions, namely the city/district with the highest community welfare, the 2nd cluster there are 11 regions, namely cities/districts with moderate social welfare, and in the third cluster there are 4 regions, namely cities/districts with the lowest community welfare.

Download Full-text

Cutting condition analysis of micro drill catalog database by using various non-hierarchical clustering methods.

The Proceedings of Mechanical Engineering Congress Japan ◽

10.1299/jsmemecj.2020.s14403 ◽

2020 ◽

Vol 2020 (0) ◽

pp. S14403

Author(s):

Yoshito NOHARA ◽

Toshiki HIROGAKI ◽

Eiichi AOYAMA ◽

Hiroyuki KODAMA

Keyword(s):

Hierarchical Clustering ◽

Cutting Condition ◽

Clustering Methods ◽

Hierarchical Clustering Methods ◽

Micro Drill

Download Full-text

Machine-learning identification of asteroid groups

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz1795 ◽

2019 ◽

Vol 488 (1) ◽

pp. 1377-1386 ◽

Cited By ~ 2

Author(s):

V Carruba ◽

S Aljbaae ◽

A Lucchini

Keyword(s):

Machine Learning ◽

Hierarchical Clustering ◽

Clustering Algorithms ◽

Family Members ◽

Parent Body ◽

Clustering Methods ◽

New Family ◽

Hierarchical Clustering Methods ◽

Rotational Failure ◽

Asteroid Families

ABSTRACT Asteroid families are groups of asteroids that share a common origin. They can be the outcome of a collision or be the result of the rotational failure of a parent body or its satellites. Collisional asteroid families have been identified for several decades using hierarchical clustering methods (HCMs) in proper elements domains. In this method, the distance of an asteroid from a reference body is computed, and, if it is less than a critical value, the asteroid is added to the family list. The process is then repeated with the new object as a reference, until no new family members are found. Recently, new machine-learning clustering algorithms have been introduced for the purpose of cluster classification. Here, we apply supervised-learning hierarchical clustering algorithms for the purpose of asteroid families identification. The accuracy, precision, and recall values of results obtained with the new method, when compared with classical HCM, show that this approach is able to found family members with an accuracy above 89.5 per cent, and that all asteroid previously identified as family members by traditional methods are consistently retrieved. Values of the areas under the curve coefficients below Receiver Operating Characteristic curves are also optimal, with values consistently above 85 per cent. Overall, we identify 6 new families and 13 new clumps in regions where the method can be applied that appear to be consistent and homogeneous in terms of physical and taxonomic properties. Machine-learning clustering algorithms can, therefore, be very efficient and fast tools for the problem of asteroid family identification.

Download Full-text

Using hierarchical clustering to explore patterns of deprivation among English local authorities

Journal of Public Health ◽

10.1093/pubmed/fdz182 ◽

2019 ◽

Vol 42 (4) ◽

pp. 772-777

Author(s):

Steven L Senior

Keyword(s):

Hierarchical Clustering ◽

Demographic Characteristics ◽

Homogeneous Group ◽

Statistical Testing ◽

Clustering Methods ◽

Local Authorities ◽

Multiple Deprivation ◽

Hierarchical Clustering Methods ◽

Different Levels ◽

Indices Of Multiple Deprivation

Abstract Background The English Indices of Multiple Deprivation (IMD) is widely used as a measure of deprivation. However, similarly ranked areas can differ substantially in the underlying domains of deprivation. These domains contain a richer set of data that might be useful for classifying local authorities. Clustering methods offer a set of techniques to identify groups of areas with similar patterns of deprivation. Methods Hierarchical agglomerative (i.e. bottom-up) clustering methods were applied to domain scores for 152 upper tier local authorities. Advances in statistical testing allow clusters to be identified that are unlikely to have arisen from random partitioning of a homogeneous group. The resulting clusters are described in terms of their subdomain scores and basic geographic and demographic characteristics. Results Five statistically significant clusters of local authorities were identified. These clusters only partially reflect different levels of overall deprivation. In particular, two clusters share similar overall IMD scores but have contrasting patterns of deprivation. Conclusion Hierarchical clustering methods identify five distinct clusters that do not correspond closely to quintiles of deprivation. This approach may help to distinguish between places that face similar underlying challenges, and places that appear similar in terms of overall deprivation scores, but that face different challenges.

Download Full-text

Efficient algorithms for agglomerative hierarchical clustering methods

Journal of Classification ◽

10.1007/bf01890115 ◽

1984 ◽

Vol 1 (1) ◽

pp. 7-24 ◽

Cited By ~ 364

Author(s):

William H. E. Day ◽

Herbert Edelsbrunner

Keyword(s):

Hierarchical Clustering ◽

Efficient Algorithms ◽

Agglomerative Hierarchical Clustering ◽

Clustering Methods ◽

Hierarchical Clustering Methods

Download Full-text

Research on NMF Based Hierarchical Clustering Methods

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.439-440.1306 ◽

2010 ◽

Vol 439-440 ◽

pp. 1306-1311

Author(s):

Fang Li ◽

Qun Xiong Zhu

Keyword(s):

Hierarchical Clustering ◽

Clustering Algorithm ◽

Clustering Methods ◽

Agglomerative Clustering ◽

Clustering Method ◽

Hierarchical Agglomerative Clustering ◽

Hierarchical Clustering Methods

LSI based hierarchical agglomerative clustering algorithm is studied. Aiming to the problems of LSI based hierarchical agglomerative clustering method, NMF based hierarchical clustering method is proposed and analyzed. Two ways of implementing NMF based method are introduced. Finally the result of two groups of experiment based on the TanCorp document corpora show that the method proposed is effective.

Download Full-text

An unsupervised method of classifying remotely sensed images using Kohonen self‐organizing maps and agglomerative hierarchical clustering methods

International Journal of Remote Sensing ◽

10.1080/01431160701442146 ◽

2008 ◽

Vol 29 (11) ◽

pp. 3171-3207 ◽

Cited By ~ 51

Author(s):

M. L. Gonçalves ◽

M. L. A. Netto ◽

J. A. F. Costa ◽

J. Zullo Júnior

Keyword(s):

Hierarchical Clustering ◽

Remotely Sensed ◽

Agglomerative Hierarchical Clustering ◽

Clustering Methods ◽

Self Organizing Maps ◽

Unsupervised Method ◽

Hierarchical Clustering Methods ◽

Remotely Sensed Images ◽

Self Organizing

Download Full-text

Determination of genetic structure of germplasm collections: are traditional hierarchical clustering methods appropriate for molecular marker data?

Theoretical and Applied Genetics ◽

10.1007/s00122-011-1576-x ◽

2011 ◽

Vol 123 (2) ◽

pp. 195-205 ◽

Cited By ~ 62

Author(s):

T. L. Odong ◽

J. van Heerwaarden ◽

J. Jansen ◽

T. J. L. van Hintum ◽

F. A. van Eeuwijk

Keyword(s):

Genetic Structure ◽

Molecular Marker ◽

Hierarchical Clustering ◽

Clustering Methods ◽

Marker Data ◽

Germplasm Collections ◽

Hierarchical Clustering Methods

Download Full-text

Hierarchical kt jet clustering for parallel architectures

Acta Universitatis Sapientiae Informatica ◽

10.1515/ausi-2017-0012 ◽

2017 ◽

Vol 9 (2) ◽

pp. 195-213

Author(s):

Richárd Forster ◽

Ágnes Fülöp

Keyword(s):

Hierarchical Clustering ◽

Particle Physics ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Energy ◽

Theoretical Physics ◽

High Energy Particle ◽

Clustering Methods ◽

Hierarchical Clustering Methods ◽

Using Data

AbstractThe reconstruction and analyze of measured data play important role in the research of high energy particle physics. This leads to new results in both experimental and theoretical physics. This requires algorithm improvements and high computer capacity. Clustering algorithm makes it possible to get to know the jet structure more accurately. More granular parallelization of the kt cluster algorithms was explored by combining it with the hierarchical clustering methods used in network evaluations. The kt method allows to know the development of particles due to the collision of high-energy nucleus-nucleus. The hierarchical clustering algorithms works on graphs, so the particle information used by the standard kt algorithm was first transformed into an appropriate graph, representing the network of particles. Testing was done using data samples from the Alice offine library, which contains the required modules to simulate the ALICE detector that is a dedicated Pb-Pb detector. The proposed algorithm was compared to the FastJet toolkit's standard longitudinal invariant kt implementation. Parallelizing the standard non-optimized version of this algorithm utilizing the available CPU architecture proved to be 1:6 times faster, than the standard implementation, while the proposed solution in this paper was able to achieve a 12 times faster computing performance, also being scalable enough to efficiently run on GPUs.

Download Full-text