scholarly journals Detection of Complexes in Biological Networks Through Diversified Dense Subgraph Mining

2017 ◽  
Vol 24 (9) ◽  
pp. 923-941 ◽  
Author(s):  
Xiuli Ma ◽  
Guangyu Zhou ◽  
Jingbo Shang ◽  
Jingjing Wang ◽  
Jian Peng ◽  
...  
2020 ◽  
Vol 14 (4) ◽  
pp. 573-585
Author(s):  
Guimu Guo ◽  
Da Yan ◽  
M. Tamer Özsu ◽  
Zhe Jiang ◽  
Jalal Khalil

Given a user-specified minimum degree threshold γ , a γ -quasiclique is a subgraph g = (V g , E g ) where each vertex ν ∈ V g connects to at least γ fraction of the other vertices (i.e., ⌈ γ · (| V g |- 1)⌉ vertices) in g. Quasi-clique is one of the most natural definitions for dense structures useful in finding communities in social networks and discovering significant biomolecule structures and pathways. However, mining maximal quasi-cliques is notoriously expensive. In this paper, we design parallel algorithms for mining maximal quasi-cliques on G-thinker, a distributed graph mining framework that decomposes mining into compute-intensive tasks to fully utilize CPU cores. We found that directly using G-thinker results in the straggler problem due to (i) the drastic load imbalance among different tasks and (ii) the difficulty of predicting the task running time. We address these challenges by redesigning G-thinker's execution engine to prioritize long-running tasks for execution, and by utilizing a novel timeout strategy to effectively decompose long-running tasks to improve load balancing. While this system redesign applies to many other expensive dense subgraph mining problems, this paper verifies the idea by adapting the state-of-the-art quasi-clique algorithm, Quick, to our redesigned G-thinker. Extensive experiments verify that our new solution scales well with the number of CPU cores, achieving 201× runtime speedup when mining a graph with 3.77M vertices and 16.5M edges in a 16-node cluster.


2013 ◽  
Vol 34 (11) ◽  
pp. 1252-1262 ◽  
Author(s):  
Anita Keszler ◽  
Tamás Szirányi ◽  
Zsolt Tuza

2013 ◽  
Vol 40 (2) ◽  
pp. 243-278 ◽  
Author(s):  
Stephan Günnemann ◽  
Ines Färber ◽  
Brigitte Boden ◽  
Thomas Seidl

Pattern Mining is the key mechanism to manage large scale data element. Frequent subgraph mining (FSM) considers isomorphism which is a subprocess of pattern mining is a well-studied problem in the data mining. Graphs are considered as a standard structure in many domains such as protein-protein interaction network in biological networks, wired or wireless interconnection networks, web data, etc. FSM is the task of finding all frequent subgraphs from a given database i.e. a single big graph or database of many graphs, whose support is greater than the given threshold value. Many databases consider small graphs for solving complex problems. The classification of graph depends upon the application requirement. A good mining architecture may prevent a lot of memory and time. This paper follows the Grami structure for the analysis of frequent subgraph mining and also introduces the 20% threshold policy for the enhancement of the directed pattern graphs. The constraint satisfaction problem (CSP) has been discussed and analyzed using the Grami approach. The proposed model is compared to Grami on twitter dataset based on the evaluation of time and memory consumed. The proposed algorithm shows an improvement of 3-4 % for both the parameters. The results show that the performance of Grami approach has been improved which shows a 6.6% reduction in time and 21% improvement in memory consumption using the proposed approach.


Sign in / Sign up

Export Citation Format

Share Document