High-performance implementations of a clustering algorithm for finding network communities

An approach for document retrieval using cluster-based inverted indexing

Journal of Information Science ◽

10.1177/01655515211018401 ◽

2021 ◽

pp. 016555152110184

Author(s):

Gunjan Chandwani ◽

Anil Ahlawat ◽

Gaurav Dubey

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Pearson Correlation ◽

Relevant Information ◽

Document Retrieval ◽

Bhattacharyya Distance ◽

Data Set ◽

Query Matching ◽

Inverted Indexing ◽

Query Optimisation

Document retrieval plays an important role in knowledge management as it facilitates us to discover the relevant information from the existing data. This article proposes a cluster-based inverted indexing algorithm for document retrieval. First, the pre-processing is done to remove the unnecessary and redundant words from the documents. Then, the indexing of documents is done by the cluster-based inverted indexing algorithm, which is developed by integrating the piecewise fuzzy C-means (piFCM) clustering algorithm and inverted indexing. After providing the index to the documents, the query matching is performed for the user queries using the Bhattacharyya distance. Finally, the query optimisation is done by the Pearson correlation coefficient, and the relevant documents are retrieved. The performance of the proposed algorithm is analysed by the WebKB data set and Twenty Newsgroups data set. The analysis exposes that the proposed algorithm offers high performance with a precision of 1, recall of 0.70 and F-measure of 0.8235. The proposed document retrieval system retrieves the most relevant documents and speeds up the storing and retrieval of information.

Download Full-text

A categorization tool for fabric systems used in firefighters' clothing based on their thermal protective and thermo-physiological comfort performances

Textile Research Journal ◽

10.1177/0040517518809055 ◽

2018 ◽

Vol 89 (16) ◽

pp. 3244-3259 ◽

Cited By ~ 2

Author(s):

Sumit Mandal ◽

Simon Annaheim ◽

Andre Capt ◽

Jemma Greve ◽

Martin Camenzind ◽

...

Keyword(s):

Performance Index ◽

Protective Clothing ◽

High Performance ◽

Clustering Algorithm ◽

Health And Safety ◽

Thermal Protective Performance ◽

Thermal Protective Clothing ◽

Low Performance ◽

Fabric System ◽

Protective Performance

Fabric systems used in firefighters' thermal protective clothing should offer optimal thermal protective and thermo-physiological comfort performances. However, fabric systems that have very high thermal protective performance have very low thermo-physiological comfort performance. As these performances are inversely related, a categorization tool based on these two performances can help to find the best balance between them. Thus, this study is aimed at developing a tool for categorizing fabric systems used in protective clothing. For this, a set of commercially available fabric systems were evaluated and categorized. The thermal protective and thermo-physiological comfort performances were measured by standard tests and indexed into a normalized scale between 0 (low performance) and 1 (high performance). The indices dataset was first divided into three clusters by using the k-means algorithm. Here, each cluster had a centroid representing a typical Thermal Protective Performance Index (TPPI) value and a typical Thermo-physiological Comfort Performance Index (TCPI) value. By using the ISO 11612:2015 and EN 469:2014 guidelines related to the TPPI requirements, the clustered fabric systems were divided into two groups: Group 1 (high thermal protective performance-based fabric systems) and Group 2 (low thermal protective performance-based fabric systems). The fabric systems in each of these TPPI groups were further categorized based on the typical TCPI values obtained from the k-means clustering algorithm. In this study, these categorized fabric systems showed either high or low thermal protective performance with low, medium, or high thermo-physiological comfort performance. Finally, a tool for using these categorized fabric systems was prepared and presented graphically. The allocations of the fabric systems within the categorization tool have been verified based on their properties (e.g., thermal resistance, weight, evaporative resistance) and construction parameters (e.g., woven, nonwoven, layers), which significantly affect the performance. In this way, we identified key characteristics among the categorized fabric systems which can be used to upgrade or develop high-performance fabric systems. Overall, the categorization tool developed in this study could help clothing manufacturers or textile engineers select and/or develop appropriate fabric systems with maximum thermal protective performance and thermo-physiological comfort performance. Thermal protective clothing manufactured using this type of newly developed fabric system could provide better occupational health and safety for firefighters.

Download Full-text

A Hybrid K-Harmonic Means with ABC Clustering Algorithm Using an Optimal K Value for High Performance Clustering

International Journal on Cybernetics & Informatics ◽

10.5121/ijci.2016.5206 ◽

2016 ◽

Vol 5 (2) ◽

pp. 51-59

Author(s):

Sithara E.P ◽

Abdul Nazeer K.A

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

K Value ◽

Harmonic Means

Download Full-text

Emulation of high-performance correlation-based quantum clustering algorithm for two-dimensional data on FPGA

Quantum Information Processing ◽

10.1007/s11128-020-02683-9 ◽

2020 ◽

Vol 19 (6) ◽

Author(s):

Talal Bonny ◽

A. Haq

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Two Dimensional ◽

Quantum Clustering

Download Full-text

Document Clustering and Automatic Labeling for Forensic Analysis Using High Performance Clustering Algorithm

International Journal Of Engineering And Computer Science ◽

10.18535/ijecs/v4i9.29 ◽

2015 ◽

Author(s):

Asmita V. Mane ◽

◽

Prof. Gitanjali Shinde ◽

Keyword(s):

High Performance ◽

Clustering Algorithm ◽

Document Clustering ◽

Forensic Analysis

Download Full-text

A fast and accurate similarity-constrained subspace clustering algorithm for land cover segmentation

10.36227/techrxiv.14414789 ◽

2021 ◽

Author(s):

Carlos Hinojosa ◽

Esteban Vera ◽

Henry Arguello

Keyword(s):

Land Cover ◽

High Performance ◽

Clustering Algorithm ◽

Spatial Information ◽

Subspace Clustering ◽

Coefficient Matrix ◽

Clustering Methods ◽

Spatial Homogeneity ◽

Discrimination Capability ◽

Unsupervised Deep Learning

Accurate land cover segmentation of spectral images is challenging and has drawn widespread attention in remote sensing due to its inherent complexity. Although significant efforts have been made for developing a variety of methods, most of them rely on supervised strategies. Subspace clustering methods, such as Sparse Subspace Clustering (SSC), have become a popular tool for unsupervised learning due to their high performance. However, the computational complexity of SSC methods prevents their use on large spectral remotely sensed datasets. Furthermore, since SSC ignores the spatial information in the spectral images, its discrimination capability is limited, hampering the clustering results' spatial homogeneity. To address these two relevant issues, in this paper, we propose a fast algorithm that obtains a sparse representation coefficient matrix by first selecting a small set of pixels that best represent their neighborhood. Then, it performs spatial filtering to enforce the connectivity of neighboring pixels and uses fast spectral clustering to get the final segmentation. Extensive simulations with our method demonstrate its effectiveness in land cover segmentation, obtaining remarkable high clustering performance compared with state-of-the-art SSC-based algorithms and even novel unsupervised-deep-learning-based methods. Besides, the proposed method is up to three orders of magnitude faster than SSC when clustering more than 2x10<sup>4</sup> spectral pixels.

Download Full-text

A high-performance VLSI architecture for the histogram peak-climbing data clustering algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems ◽

10.1109/tvlsi.2005.863761 ◽

2006 ◽

Vol 14 (2) ◽

pp. 111-121 ◽

Cited By ~ 4

Author(s):

O.J. Hernandez

Keyword(s):

Data Clustering ◽

High Performance ◽

Clustering Algorithm ◽

Vlsi Architecture

Download Full-text

New Fault-Tolerant Hierarchical Routing Protocol for WSNs

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.3223 ◽

2012 ◽

Vol 433-440 ◽

pp. 3223-3229

Author(s):

Davood Keykhosravi ◽

Ali Hosseinalipour

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Network Lifetime ◽

High Performance ◽

Clustering Algorithm ◽

Fault Tolerant ◽

Clustering Algorithms ◽

Cluster Head ◽

Wireless Sensor ◽

Geographical Position

Clustering in wireless sensor networks is one of the crucial methods for increasing of network lifetime. There are many algorithms for clustering. One of the cluster based algorithm in wireless sensor networks is LEACH algorithm. In this paper we proposed a new clustering method for increasing of network lifetime. In proposed method Clustering is done symmetrically and the best node with respect to remained energy and distance of other nodes in comparing with each that selected as a cluster head. Although in this protocol we didn’t use GPS but we could find geographical position nodes so easily. However, failures in higher level of hierarchy e.g. cluster-head cause more damage to the system because they also limit accessibility to the nodes that are under their supervision. In this paper we propose an efficient mechanism to recover sensors from a failed cluster. In this performance of the proposed algorithm via computer simulation was evaluated and compared with other clustering algorithms. The simulation results show the high performance of the proposed clustering algorithm.

Download Full-text

A Text Categorization Method Based on SVM and Improved K-Means

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2449 ◽

2013 ◽

Vol 427-429 ◽

pp. 2449-2453

Author(s):

Rong Ze Xia ◽

Yan Jia ◽

Hu Li

Keyword(s):

Support Vector Machine ◽

Vector Space ◽

High Performance ◽

Supervised Classification ◽

Text Categorization ◽

Clustering Algorithm ◽

Vector Space Model ◽

Classification Method ◽

Support Vector ◽

Space Model

Traditional supervised classification method such as support vector machine (SVM) could achieve high performance in text categorization. However, we should first hand-labeled the samples before classifying. Its a time-consuming task. Unsupervised method such as k-means could also be used for handling the text categorization problem. However, Traditional k-means could easily be affected by several isolated observations. In this paper, we proposed a new text categorization method. First we improved the traditional k-means clustering algorithm. The improved k-means is used for clustering vectors in our vector space model. After that, we use the SVM to categorize vectors which are preprocessed by improved k-means. The experiments show that our algorithm could out-perform the traditional SVM text categorization method.

Download Full-text

Heterogeneous Distributed Big Data Clustering on Sparse Grids

10.20944/preprints201902.0019.v1 ◽

2019 ◽

Author(s):

David Pfander ◽

Gregor Daiß ◽

Dirk Pflüger

Keyword(s):

Big Data ◽

Density Estimation ◽

High Performance ◽

Clustering Algorithm ◽

Sparse Grid ◽

Peak Performance ◽

Density Based Clustering ◽

Wide Range ◽

Data Points ◽

And Performance

Clustering is an important task in data mining that has become more challenging due to the ever-increasing size of available datasets. To cope with these big data scenarios, a high-performance clustering approach is required. Sparse grid clustering is a density-based clustering method that uses a sparse grid density estimation as its central building block. The underlying density estimation approach enables the detection of clusters with non-convex shapes and without a predetermined number of clusters. In this work, we introduce a new distributed and performance-portable variant of the sparse grid clustering algorithm that is suited for big data settings. Our compute kernels were implemented in OpenCL to enable portability across a wide range of architectures. For distributed environments, we added a manager-worker scheme that was implemented using MPI. In experiments on two supercomputers, Piz Daint and Hazel Hen, with up to 100 million data points in a 10-dimensional dataset, we show the performance and scalability of our approach. The dataset with 100 million data points was clustered in 1198s using 128 nodes of Piz Daint. This translates to an overall performance of 352TFLOPS. On the node-level, we provide results for two GPUs, Nvidia's Tesla P100 and the AMD FirePro W8100, and one processor-based platform that uses Intel Xeon E5-2680v3 processors. In these experiments, we achieved between 43% and 66% of the peak performance across all compute kernels and devices, demonstrating the performance portability of our approach.

Download Full-text