An Empirical Seed Initialization Idea for K-Means Algorithm Inspired by CLIQUE Algorithm

Author(s):  
Alok Chakrabarty
Keyword(s):  
2020 ◽  
Vol 14 (4) ◽  
pp. 573-585
Author(s):  
Guimu Guo ◽  
Da Yan ◽  
M. Tamer Özsu ◽  
Zhe Jiang ◽  
Jalal Khalil

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.


2006 ◽  
Vol 180 (2) ◽  
pp. 676-682 ◽  
Author(s):  
Xiaohong Shi ◽  
LuoLiang ◽  
Yan Wan ◽  
Jin Xu

Algorithmica ◽  
2007 ◽  
Vol 55 (1) ◽  
pp. 42-59 ◽  
Author(s):  
Benjamin Birnbaum ◽  
Kenneth J. Goldman
Keyword(s):  

2013 ◽  
Vol 312 ◽  
pp. 714-718
Author(s):  
Zi Qi Zhao ◽  
Xiao Jun Ye ◽  
Chun Ping Li

Multidimensional clustering analysis algorithm is for a class of cell-based clustering method of processing speed quickly, time efficiency, mainly to CLIQUE representatives. With time efficient clustering algorithm CLIQUE algorithm can achieve multi-dimensional k - Anonymous the algorithm KLIQUE, KLIQUE algorithm based CLIQUE efficiently retained their CLIQUE algorithm time complexity of features, can play the CLIQUE multidimensional data for the large amount of data processing advantage.


2008 ◽  
Vol 156 (13) ◽  
pp. 2439-2448 ◽  
Author(s):  
Andrea Scozzari ◽  
Fabio Tardella

Mathematics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 97
Author(s):  
Kristjan Reba ◽  
Matej Guid ◽  
Kati Rozman ◽  
Dušanka Janežič ◽  
Janez Konc

Finding a maximum clique is important in research areas such as computational chemistry, social network analysis, and bioinformatics. It is possible to compare the maximum clique size between protein graphs to determine their similarity and function. In this paper, improvements based on machine learning (ML) are added to a dynamic algorithm for finding the maximum clique in a protein graph, Maximum Clique Dynamic (MaxCliqueDyn; short: MCQD). This algorithm was published in 2007 and has been widely used in bioinformatics since then. It uses an empirically determined parameter, Tlimit, that determines the algorithm’s flow. We have extended the MCQD algorithm with an initial phase of a machine learning-based prediction of the Tlimit parameter that is best suited for each input graph. Such adaptability to graph types based on state-of-the-art machine learning is a novel approach that has not been used in most graph-theoretic algorithms. We show empirically that the resulting new algorithm MCQD-ML improves search speed on certain types of graphs, in particular molecular docking graphs used in drug design where they determine energetically favorable conformations of small molecules in a protein binding site. In such cases, the speed-up is twofold.


2012 ◽  
Vol 452-453 ◽  
pp. 381-385
Author(s):  
Shao Peng Sun ◽  
Kai Hu Hou ◽  
Li Hua Chen

Many data cleansing algorithms are based on the low dimensional data currently, and can't meet the requirement of accuracy that data warehouse in the enterprise processes the high dimensional data. In this paper the idea of using the CLIQUE algorithm to process the high dimensional data was adopted. Aiming at the insufficient processing precision of this algorithm, the meshing and pruning algorithm were improved by using the dynamic incremental algorithms. The result of testing data shows that this algorithm can improve the accuracy of the clustering result and can effectively judge the similar clustering and abnormal points which support the high dimensional data cleansing.


Author(s):  
Phu Ngoc Vo ◽  
Tran Vo Thi Ngoc

Many different areas of computer science have been developed for many years in the world. Data mining is one of the fields which many algorithms, methods, and models have been built and applied to many commercial applications and research successfully. Many social networks have been invested and developed in the strongest way for the recent years in the world because they have had many big benefits as follows: they have been used by lots of users in the world and they have been applied to many business fields successfully. Thus, a lot of different techniques for the social networks have been generated. Unsurprisingly, the social network analysis is crucial at the present time in the world. To support this process, in this book chapter we have presented many simple concepts about data mining and social networking. In addition, we have also displayed a novel model of the data mining for the social network analysis using a CLIQUE algorithm successfully.


Sign in / Sign up

Export Citation Format

Share Document