Scalable Recursive Top-Down Hierarchical Clustering Approach with Implicit Model Selection for Textual Data Sets

Author(s):  
Markus Muhr ◽  
Vedran Sabol ◽  
Michael Granitzer
2021 ◽  
Vol 8 (10) ◽  
pp. 43-50
Author(s):  
Truong et al. ◽  

Clustering is a fundamental technique in data mining and machine learning. Recently, many researchers are interested in the problem of clustering categorical data and several new approaches have been proposed. One of the successful and pioneering clustering algorithms is the Minimum-Minimum Roughness algorithm (MMR) which is a top-down hierarchical clustering algorithm and can handle the uncertainty in clustering categorical data. However, MMR tends to choose the category with less value leaf node with more objects, leading to undesirable clustering results. To overcome such shortcomings, this paper proposes an improved version of the MMR algorithm for clustering categorical data, called IMMR (Improved Minimum-Minimum Roughness). Experimental results on actual data sets taken from UCI show that the IMMR algorithm outperforms MMR in clustering categorical data.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Peng Zhang ◽  
Kun She

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1021
Author(s):  
Zhanserik Nurlan ◽  
Tamara Zhukabayeva ◽  
Mohamed Othman

Wireless sensor networks (WSN) are networks of thousands of nodes installed in a defined physical environment to sense and monitor its state condition. The viability of such a network is directly dependent and limited by the power of batteries supplying the nodes of these networks, which represents a disadvantage of such a network. To improve and extend the life of WSNs, scientists around the world regularly develop various routing protocols that minimize and optimize the energy consumption of sensor network nodes. This article, introduces a new heterogeneous-aware routing protocol well known as Extended Z-SEP Routing Protocol with Hierarchical Clustering Approach for Wireless Heterogeneous Sensor Network or EZ-SEP, where the connection of nodes to a base station (BS) is done via a hybrid method, i.e., a certain amount of nodes communicate with the base station directly, while the remaining ones form a cluster to transfer data. Parameters of the field are unknown, and the field is partitioned into zones depending on the node energy. We reviewed the Z-SEP protocol concerning the election of the cluster head (CH) and its communication with BS and presented a novel extended mechanism for the selection of the CH based on remaining residual energy. In addition, EZ-SEP is weighted up using various estimation schemes such as base station repositioning, altering the field density, and variable nodes energy for comparison with the previous parent algorithm. EZ-SEP was executed and compared to routing protocols such as Z-SEP, SEP, and LEACH. The proposed algorithm performed using the MATLAB R2016b simulator. Simulation results show that our proposed extended version performs better than Z-SEP in the stability period due to an increase in the number of active nodes by 48%, in efficiency of network by the high packet delivery coefficient by 16% and optimizes the average power consumption compared to by 34.


2021 ◽  
Vol 13 (13) ◽  
pp. 2489
Author(s):  
Lanlan Rao ◽  
Jian Xu ◽  
Dmitry S. Efremenko ◽  
Diego G. Loyola ◽  
Adrian Doicu

To retrieve aerosol properties from satellite measurements, micro-physical aerosol models have to be assumed. Due to the spatial and temporal inhomogeneity of aerosols, choosing an appropriate aerosol model is an important task. In this paper, we use a Bayesian algorithm that takes into account model uncertainties to retrieve the aerosol optical depth and layer height from synthetic and real TROPOMI O2A band measurements. The results show that in case of insufficient information for an appropriate micro-physical model selection, the Bayesian algorithm improves the accuracy of the solution.


Sign in / Sign up

Export Citation Format

Share Document