Efficient Vector Partitioning Algorithms for Graph Clustering

Hiroaki Shiokawa; Yasunori Futamura

doi:10.26421/jdi1.2-1

Efficient Vector Partitioning Algorithms for Graph Clustering

journal of Data Intelligence ◽

10.26421/jdi1.2-1 ◽

2020 ◽

Vol 1 (2) ◽

pp. 101-123

Author(s):

Hiroaki Shiokawa ◽

Yasunori Futamura

Keyword(s):

Social Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Ground Truth ◽

Graph Clustering ◽

Mining Communities ◽

Fine Grained ◽

Efficient Vector ◽

Public Datasets ◽

Many Core

This paper addressed the problem of finding clusters included in graph-structured data such as Web graphs, social networks, and others. Graph clustering is one of the fundamental techniques for understanding structures present in the complex graphs such as Web pages, social networks, and others. In the Web and data mining communities, the modularity-based graph clustering algorithm is successfully used in many applications. However, it is difficult for the modularity-based methods to find fine-grained clusters hidden in large-scale graphs; the methods fail to reproduce the ground truth. In this paper, we present a novel modularity-based algorithm, \textit{CAV}, that shows better clustering results than the traditional algorithm. The proposed algorithm employs a cohesiveness-aware vector partitioning into the graph spectral analysis to improve the clustering accuracy. Additionally, this paper also presents a novel efficient algorithm \textit{P-CAV} for further improving the clustering speed of CAV; P-CAV is an extension of CAV that utilizes the thread-based parallelization on a many-core CPU. Our extensive experiments on synthetic and public datasets demonstrate the performance superiority of our approaches over the state-of-the-art approaches.

Download Full-text

Varied density based graph clustering algorithm for social networks

2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) ◽

10.1109/i-smac.2017.8058404 ◽

2017 ◽

Author(s):

M Venkata Sowjanya ◽

T Maruthi Padmaja

Keyword(s):

Social Networks ◽

Clustering Algorithm ◽

Graph Clustering

Download Full-text

Learning from Web Data Using Adversarial Discriminative Neural Networks for Fine-Grained Classification

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.3301273 ◽

2019 ◽

Vol 33 ◽

pp. 273-280 ◽

Cited By ~ 2

Author(s):

Xiaoxiao Sun ◽

Liyi Chen ◽

Jufeng Yang

Keyword(s):

Neural Networks ◽

Large Scale ◽

State Of The Art ◽

Training Data ◽

Web Data ◽

Fine Grained ◽

Learning Framework ◽

Attractive Option ◽

Public Datasets ◽

Noisy Labels

Fine-grained classification is absorbed in recognizing the subordinate categories of one field, which need a large number of labeled images, while it is expensive to label these images. Utilizing web data has been an attractive option to meet the demands of training data for convolutional neural networks (CNNs), especially when the well-labeled data is not enough. However, directly training on such easily obtained images often leads to unsatisfactory performance due to factors such as noisy labels. This has been conventionally addressed by reducing the noise level of web data. In this paper, we take a fundamentally different view and propose an adversarial discriminative loss to advocate representation coherence between standard and web data. This is further encapsulated in a simple, scalable and end-to-end trainable multi-task learning framework. We experiment on three public datasets using large-scale web data to evaluate the effectiveness and generalizability of the proposed approach. Extensive experiments demonstrate that our approach performs favorably against the state-of-the-art methods.

Download Full-text

Efficient detection of communities with significant overlaps in networks: Partial community merger algorithm

Network Science ◽

10.1017/nws.2017.32 ◽

2017 ◽

Vol 6 (1) ◽

pp. 71-96 ◽

Cited By ~ 1

Author(s):

ELVIS H. W. XU ◽

PAK MING HUI

Keyword(s):

Social Networks ◽

Online Social Networks ◽

Large Scale ◽

Partial Information ◽

Ground Truth ◽

Ego Networks ◽

Detection Algorithms ◽

Efficient Detection ◽

Merger Process ◽

Key Issues

AbstractDetecting communities in large-scale social networks is a challenging task where each vertex may belong to multiple communities. Such behavior of vertices and the implied strong overlaps among communities render many detection algorithms invalid. We develop a Partial Community Merger Algorithm (PCMA) for detecting communities with significant overlaps as well as slightly overlapping and disjoint ones. It is a bottom-up approach based on properly reassembling partial information of communities revealed in ego networks of vertices to reconstruct complete communities. We propose a novel similarity measure of communities and an efficient merger process to address the two key issues—noise control and merger order—in implementing this approach. PCMA is tested against two benchmarks and overall it outperforms all compared algorithms in both accuracy and efficiency. It is applied to two huge online social networks, Friendster and Sina Weibo. Millions of communities are detected and they are of higher qualities than the corresponding metadata groups. We find that the latter should not be regarded as the ground-truth of structural communities. The significant overlapping pattern found in the detected communities confirms the need of new algorithms, such as PCMA, to handle multiple memberships of vertices in social networks.

Download Full-text

A Big Graph Clustering Algorithm Based on MapReduce

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.1049-1050.1467 ◽

2014 ◽

Vol 1049-1050 ◽

pp. 1467-1470 ◽

Cited By ~ 1

Author(s):

Yong Lin Leng ◽

Qing Chen Zhang

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Graph Clustering ◽

Calculation Model ◽

Large Scale Data ◽

Structure Similarity ◽

Universal Structure ◽

Time And Space Complexity ◽

Ap Clustering ◽

Graph Nodes

Graph clustering is an important technology in graph analysis area, the measure of similarity between node of graph is the presise for graph clustering. SimRank algorithm is a kind of universal structure similarity calculation model which is proposed by Jeh and Widom. SimRank algorithm using iterative method to calculate the similarity between nodes, so the time and space complexity is very high. With the rapid increase of data, the ability of single machine can not meet the requirement of the large-scale data calculation. In this paper, the distributed SimRank algorithm was proposed based on Mapreduce and was used to measure the similarity of graph. Then the distributed AP clustering algorithm was designed for clustering analysis graph nodes. The experimental was executed to compare the clustering running time and speedup and results show that the method can efficiently complete graph nodes similarity measure and clustering the large graph effectively.

Download Full-text

Hierarchical semantic interaction-based deep hashing network for cross-modal retrieval

PeerJ Computer Science ◽

10.7717/peerj-cs.552 ◽

2021 ◽

Vol 7 ◽

pp. e552

Author(s):

Shubai Chen ◽

Song Wu ◽

Li Wang

Keyword(s):

High Performance ◽

Large Scale ◽

High Efficiency ◽

Specific Information ◽

Linear Interaction ◽

Fine Grained ◽

Semantic Correlation ◽

Deep Hashing ◽

Public Datasets ◽

Semantic Interaction

Due to the high efficiency of hashing technology and the high abstraction of deep networks, deep hashing has achieved appealing effectiveness and efficiency for large-scale cross-modal retrieval. However, how to efficiently measure the similarity of fine-grained multi-labels for multi-modal data and thoroughly explore the intermediate layers specific information of networks are still two challenges for high-performance cross-modal hashing retrieval. Thus, in this paper, we propose a novel Hierarchical Semantic Interaction-based Deep Hashing Network (HSIDHN) for large-scale cross-modal retrieval. In the proposed HSIDHN, the multi-scale and fusion operations are first applied to each layer of the network. A Bidirectional Bi-linear Interaction (BBI) policy is then designed to achieve the hierarchical semantic interaction among different layers, such that the capability of hash representations can be enhanced. Moreover, a dual-similarity measurement (“hard” similarity and “soft” similarity) is designed to calculate the semantic similarity of different modality data, aiming to better preserve the semantic correlation of multi-labels. Extensive experiment results on two large-scale public datasets have shown that the performance of our HSIDHN is competitive to state-of-the-art deep cross-modal hashing methods.

Download Full-text

DYNAMIC TRIP ATTRACTION ESTIMATION WITH LOCATION BASED SOCIAL NETWORK DATA BALANCING BETWEEN TIME OF DAY VARIATIONS AND ZONAL DIFFERENCES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-4-w2-193-2015 ◽

2015 ◽

Vol II-4/W2 ◽

pp. 193-198 ◽

Cited By ~ 2

Author(s):

N. W. Hu ◽

P. J. Jin

Keyword(s):

Social Network ◽

Large Scale ◽

Ground Truth ◽

Time Of Day ◽

Mobility Patterns ◽

Sensitive Data ◽

Ground Truth Data ◽

Fine Grained ◽

Social Network Data ◽

Location Based Social Network

The emergence of location based social network (LBSN) services make it accessible and affordable to study individuals’ mobility patterns in a fine-grained level. Via mobile devices, LBSN enables the availability of large-scale location-sensitive data with spatial and temporal context dimensions, which is capable of the potential to provide traffic patterns with significantly higher spatial and temporal resolution at a much lower cost than can be achieved by traditional methods. In this paper, the Foursquare LBSN data was applied to analyze the trip attraction for the urban area in Austin, Texas, USA. We explore one time-dependent function to validate the LBSN’s data with the origin-destination matrix regarded as the ground truth data. The objective of this paper is to investigate one new validation method for trip distribution. The results illustrate the promising potential of studying the dynamic trip attraction estimation with LBSN data for urban trip pattern analysis and monitoring.

Download Full-text

GPOGC: Gaussian Pigeon-Oriented Graph Clustering Algorithm for Social Networks Cluster

IEEE Access ◽

10.1109/access.2019.2926816 ◽

2019 ◽

Vol 7 ◽

pp. 99254-99262 ◽

Cited By ~ 3

Author(s):

Yang Sun ◽

Shoulin Yin ◽

Hang Li ◽

Lin Teng ◽

Shahid Karim

Keyword(s):

Social Networks ◽

Clustering Algorithm ◽

Oriented Graph ◽

Graph Clustering

Download Full-text

GLEAM: a graph clustering framework based on potential game optimization for large-scale social networks

Knowledge and Information Systems ◽

10.1007/s10115-017-1105-6 ◽

2017 ◽

Vol 55 (3) ◽

pp. 741-770 ◽

Cited By ~ 35

Author(s):

Zhan Bu ◽

Jie Cao ◽

Hui-Jia Li ◽

Guangliang Gao ◽

Haicheng Tao

Keyword(s):

Social Networks ◽

Large Scale ◽

Graph Clustering ◽

Potential Game

Download Full-text

Exploiting multi–core and many–core parallelism for subspace clustering

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2019-0006 ◽

2019 ◽

Vol 29 (1) ◽

pp. 81-91

Author(s):

Amitava Datta ◽

Amardeep Kaur ◽

Tobias Lauer ◽

Sami Chabbouh

Keyword(s):

Graphics Processing Units ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Subspace Clustering ◽

Research Problem ◽

Fine Grained ◽

Linear Speedup ◽

Many Core ◽

Graphics Processing ◽

Gpu Implementation

Abstract Finding clusters in high dimensional data is a challenging research problem. Subspace clustering algorithms aim to find clusters in all possible subspaces of the dataset, where a subspace is a subset of dimensions of the data. But the exponential increase in the number of subspaces with the dimensionality of data renders most of the algorithms inefficient as well as ineffective. Moreover, these algorithms have ingrained data dependency in the clustering process, which means that parallelization becomes difficult and inefficient. SUBSCALE is a recent subspace clustering algorithm which is scalable with the dimensions and contains independent processing steps which can be exploited through parallelism. In this paper, we aim to leverage the computational power of widely available multi-core processors to improve the runtime performance of the SUBSCALE algorithm. The experimental evaluation shows linear speedup. Moreover, we develop an approach using graphics processing units (GPUs) for fine-grained data parallelism to accelerate the computation further. First tests of the GPU implementation show very promising results.

Download Full-text

Fast graph clustering in large-scale systems based on spectral coarsening

International Journal of Modern Physics B ◽

10.1142/s0217979221501319 ◽

2021 ◽

pp. 2150131

Author(s):

Dasong Sun

Keyword(s):

Complex Networks ◽

Large Scale ◽

Clustering Algorithm ◽

Graph Clustering ◽

Superior Performance ◽

Computational Time ◽

Single Node ◽

Multiple Datasets ◽

Spectral Algorithm ◽

The Individual

Complex networks depict the individual relationship in a population, which can help to deeply mine the characteristics of complex networks and predict the potential collaboration between individuals by analyzing their interaction within different groups or clusters. However, the existing algorithms are with high complexity, which cost much computational time. In this paper, an efficient graph clustering algorithm based on spectral coarsening is proposed, to deal with the large time complexity of the traditional spectral algorithm. We first find the subset most possibly belonged to the same cluster in the original network, and merge them into a single node. The scale of the network will decrease with the network being coarsened. Then, the spectral clustering algorithm is performed on the coarsened network with the maintained advantages and the improved time efficiency. Finally, the experimental results on the multiple datasets demonstrate that the proposed algorithm, compared with the current state-of-the-art methods, has superior performance.

Download Full-text