scholarly journals Network‐based semisupervised clustering

Author(s):  
Luca Frigau ◽  
Giulia Contu ◽  
Francesco Mola ◽  
Claudio Conversano
2016 ◽  
Vol 24 (4) ◽  
pp. 992-999 ◽  
Author(s):  
Irene Diaz-Valenzuela ◽  
M. Amparo Vila ◽  
Maria J. Martin-Bautista

2011 ◽  
Vol 19 (3) ◽  
pp. 562-574 ◽  
Author(s):  
Gleb Beliakov ◽  
Simon James ◽  
Gang Li

2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Mingwei Leng ◽  
Jianjun Cheng ◽  
Jinjin Wang ◽  
Zhengquan Zhang ◽  
Hanhai Zhou ◽  
...  

The accuracy of most of the existing semisupervised clustering algorithms based on small size of labeled dataset is low when dealing with multidensity and imbalanced datasets, and labeling data is quite expensive and time consuming in many real-world applications. This paper focuses on active data selection and semisupervised clustering algorithm in multidensity and imbalanced datasets and proposes an active semisupervised clustering algorithm. The proposed algorithm uses an active mechanism for data selection to minimize the amount of labeled data, and it utilizes multithreshold to expand labeled datasets on multidensity and imbalanced datasets. Three standard datasets and one synthetic dataset are used to demonstrate the proposed algorithm, and the experimental results show that the proposed semisupervised clustering algorithm has a higher accuracy and a more stable performance in comparison to other clustering and semisupervised clustering algorithms, especially when the datasets are multidensity and imbalanced.


Author(s):  
Zhiyong Li ◽  
Xinyi Hu ◽  
Ke Li ◽  
Fanyin Zhou ◽  
Feng Shen

2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Rongfeng Zheng ◽  
Jiayong Liu ◽  
Weina Niu ◽  
Liang Liu ◽  
Kai Li ◽  
...  

The explosive growth in network traffic in recent times has resulted in increased processing pressure on network intrusion detection systems. In addition, there is a lack of reliable methods for preprocessing network traffic generated by benign applications that do not steal users’ data from their devices. To alleviate these problems, this study analyzed the differences between benign and malicious traffic produced by benign applications and malware, respectively. To fully express these differences, this study proposed a new set of statistical features for training a clustering model. Furthermore, to mine the communication channels generated by benign applications in batches, a semisupervised clustering method was adopted. Using a small number of labeled samples, our method aggregated historical network traffic into two types of clusters. The cluster that did not contain labeled malicious samples was regarded as a benign traffic cluster. The experimental results were compared using four types of clustering algorithms. The density-based spatial clustering of applications with noise (DBSCAN) clustering algorithm was selected to mine benign communication channels. We also compared our method with two other methods, and the results demonstrated that the benign channels mined through our method were more reliable. Finally, using our method, 1,811 benign transport layer security (TLS) channels were mined from 18,357 TLS communication channels. The number of flows carried by these benign channels comprised 65.37% of the entire network flows, and no malicious flow was included in our results, which proves the effectiveness of our method.


Sign in / Sign up

Export Citation Format

Share Document