High-performance implementations of a clustering algorithm for finding network communities

Author(s):  
Alex Restrepo ◽  
Andres Solano ◽  
Jerry Scripps ◽  
Christian Trefftz ◽  
Jonathan Engelsma ◽  
...  
2021 ◽  
pp. 016555152110184
Author(s):  
Gunjan Chandwani ◽  
Anil Ahlawat ◽  
Gaurav Dubey

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.


2018 ◽  
Vol 89 (16) ◽  
pp. 3244-3259 ◽  
Author(s):  
Sumit Mandal ◽  
Simon Annaheim ◽  
Andre Capt ◽  
Jemma Greve ◽  
Martin Camenzind ◽  
...  

Fabric systems used in firefighters' thermal protective clothing should offer optimal thermal protective and thermo-physiological comfort performances. However, fabric systems that have very high thermal protective performance have very low thermo-physiological comfort performance. As these performances are inversely related, a categorization tool based on these two performances can help to find the best balance between them. Thus, this study is aimed at developing a tool for categorizing fabric systems used in protective clothing. For this, a set of commercially available fabric systems were evaluated and categorized. The thermal protective and thermo-physiological comfort performances were measured by standard tests and indexed into a normalized scale between 0 (low performance) and 1 (high performance). The indices dataset was first divided into three clusters by using the k-means algorithm. Here, each cluster had a centroid representing a typical Thermal Protective Performance Index (TPPI) value and a typical Thermo-physiological Comfort Performance Index (TCPI) value. By using the ISO 11612:2015 and EN 469:2014 guidelines related to the TPPI requirements, the clustered fabric systems were divided into two groups: Group 1 (high thermal protective performance-based fabric systems) and Group 2 (low thermal protective performance-based fabric systems). The fabric systems in each of these TPPI groups were further categorized based on the typical TCPI values obtained from the k-means clustering algorithm. In this study, these categorized fabric systems showed either high or low thermal protective performance with low, medium, or high thermo-physiological comfort performance. Finally, a tool for using these categorized fabric systems was prepared and presented graphically. The allocations of the fabric systems within the categorization tool have been verified based on their properties (e.g., thermal resistance, weight, evaporative resistance) and construction parameters (e.g., woven, nonwoven, layers), which significantly affect the performance. In this way, we identified key characteristics among the categorized fabric systems which can be used to upgrade or develop high-performance fabric systems. Overall, the categorization tool developed in this study could help clothing manufacturers or textile engineers select and/or develop appropriate fabric systems with maximum thermal protective performance and thermo-physiological comfort performance. Thermal protective clothing manufactured using this type of newly developed fabric system could provide better occupational health and safety for firefighters.


2021 ◽  
Author(s):  
Carlos Hinojosa ◽  
Esteban Vera ◽  
Henry Arguello

Accurate land cover segmentation of spectral images is challenging and has drawn widespread attention in remote sensing due to its inherent complexity. Although significant efforts have been made for developing a variety of methods, most of them rely on supervised strategies. Subspace clustering methods, such as Sparse Subspace Clustering (SSC), have become a popular tool for unsupervised learning due to their high performance. However, the computational complexity of SSC methods prevents their use on large spectral remotely sensed datasets. Furthermore, since SSC ignores the spatial information in the spectral images, its discrimination capability is limited, hampering the clustering results' spatial homogeneity. To address these two relevant issues, in this paper, we propose a fast algorithm that obtains a sparse representation coefficient matrix by first selecting a small set of pixels that best represent their neighborhood. Then, it performs spatial filtering to enforce the connectivity of neighboring pixels and uses fast spectral clustering to get the final segmentation. Extensive simulations with our method demonstrate its effectiveness in land cover segmentation, obtaining remarkable high clustering performance compared with state-of-the-art SSC-based algorithms and even novel unsupervised-deep-learning-based methods. Besides, the proposed method is up to three orders of magnitude faster than SSC when clustering more than 2x10<sup>4</sup> spectral pixels.


2012 ◽  
Vol 433-440 ◽  
pp. 3223-3229
Author(s):  
Davood Keykhosravi ◽  
Ali Hosseinalipour

Clustering in wireless sensor networks is one of the crucial methods for increasing of network lifetime. There are many algorithms for clustering. One of the cluster based algorithm in wireless sensor networks is LEACH algorithm. In this paper we proposed a new clustering method for increasing of network lifetime. In proposed method Clustering is done symmetrically and the best node with respect to remained energy and distance of other nodes in comparing with each that selected as a cluster head. Although in this protocol we didn’t use GPS but we could find geographical position nodes so easily. However, failures in higher level of hierarchy e.g. cluster-head cause more damage to the system because they also limit accessibility to the nodes that are under their supervision. In this paper we propose an efficient mechanism to recover sensors from a failed cluster. In this performance of the proposed algorithm via computer simulation was evaluated and compared with other clustering algorithms. The simulation results show the high performance of the proposed clustering algorithm.


2013 ◽  
Vol 427-429 ◽  
pp. 2449-2453
Author(s):  
Rong Ze Xia ◽  
Yan Jia ◽  
Hu Li

Traditional supervised classification method such as support vector machine (SVM) could achieve high performance in text categorization. However, we should first hand-labeled the samples before classifying. Its a time-consuming task. Unsupervised method such as k-means could also be used for handling the text categorization problem. However, Traditional k-means could easily be affected by several isolated observations. In this paper, we proposed a new text categorization method. First we improved the traditional k-means clustering algorithm. The improved k-means is used for clustering vectors in our vector space model. After that, we use the SVM to categorize vectors which are preprocessed by improved k-means. The experiments show that our algorithm could out-perform the traditional SVM text categorization method.


Author(s):  
David Pfander ◽  
Gregor Daiß ◽  
Dirk Pflüger

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that uses a sparse grid density estimation as its central building block. The underlying density estimation approach enables the detection of clusters with non-convex shapes and without a predetermined number of clusters. In this work, we introduce a new distributed and performance-portable variant of the sparse grid clustering algorithm that is suited for big data settings. Our compute kernels were implemented in OpenCL to enable portability across a wide range of architectures. For distributed environments, we added a manager-worker scheme that was implemented using MPI. In experiments on two supercomputers, Piz Daint and Hazel Hen, with up to 100 million data points in a 10-dimensional dataset, we show the performance and scalability of our approach. The dataset with 100 million data points was clustered in 1198s using 128 nodes of Piz Daint. This translates to an overall performance of 352TFLOPS. On the node-level, we provide results for two GPUs, Nvidia's Tesla P100 and the AMD FirePro W8100, and one processor-based platform that uses Intel Xeon E5-2680v3 processors. In these experiments, we achieved between 43% and 66% of the peak performance across all compute kernels and devices, demonstrating the performance portability of our approach.


Sign in / Sign up

Export Citation Format

Share Document