scholarly journals A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

Information ◽  
2016 ◽  
Vol 7 (1) ◽  
pp. 6 ◽  
Author(s):  
Yong Wang ◽  
Wenlong Ke ◽  
Xiaoling Tao
2014 ◽  
Vol 989-994 ◽  
pp. 4510-4513
Author(s):  
Hong Zhi Wang ◽  
Jian Ping Zhang ◽  
Zun Yi Shang

In network traffic classification, by conventional PCA method, more features still exist due to uniform contribution rates for most of features. To overcome this problem, in this paper, a novel feature selection method is proposed to reduce data dimension of network traffic. A contribution rate of various features in each component is calculated by a new weight criterion. A maxima-order principle is proposed to determine feature selection. Based on three multi-class classification methods, performance comparison is conducted by actual traffic data with 10-fold cross-validation. Experiment shows that the proposed method has higher classification accuracy than conventional PCA method.


Symmetry ◽  
2020 ◽  
Vol 12 (2) ◽  
pp. 301 ◽  
Author(s):  
Jie Cao ◽  
Da Wang ◽  
Zhaoyang Qu ◽  
Hongyu Sun ◽  
Bin Li ◽  
...  

Network traffic classification based on machine learning is an important branch of pattern recognition in computer science. It is a key technology for dynamic intelligent network management and enhanced network controllability. However, the traffic classification methods still facing severe challenges: The optimal set of features is difficult to determine. The classification method is highly dependent on the effective characteristic combination. Meanwhile, it is also important to balance the experience risk and generalization ability of the classifier. In this paper, an improved network traffic classification model based on a support vector machine is proposed. First, a filter-wrapper hybrid feature selection method is proposed to solve the false deletion of combined features caused by a traditional feature selection method. Second, to balance the empirical risk and generalization ability of support vector machine (SVM) traffic classification model, an improved parameter optimization algorithm is proposed. The algorithm can dynamically adjust the quadratic search area, reduce the density of quadratic mesh generation, improve the search efficiency of the algorithm, and prevent the over-fitting while optimizing the parameters. The experiments show that the improved traffic classification model achieves higher classification accuracy, lower dimension and shorter elapsed time and performs significantly better than traditional SVM and the other three typical supervised ML algorithms.


2014 ◽  
Vol 701-702 ◽  
pp. 3-7
Author(s):  
Liu Bo

It has great impact on result of the network test or simulation if the test simulated traffic is corresponding to real situation. The network traffic is the superposition of different traffic streams in the actual usage of the network. But because of the complexity and time-consumption to generate different traffic streams, it is difficult to generate the network traffic in the simulation for the large scale network. This paper proposes a kind of method for traffic generating based on genetic algorithm .According to building the self-similar traffic model ,the optimal values of the model’s parameters has been obtained. A case study shows the effectiveness of the method for the network reliability.


2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
Liang Fu Lu ◽  
Zheng-Hai Huang ◽  
Mohammed A. Ambusaidi ◽  
Kui-Xiang Gou

With the rapid growth of data communications in size and complexity, the threat of malicious activities and computer crimes has increased accordingly as well. Thus, investigating efficient data processing techniques for network operation and management over large-scale network traffic is highly required. Some mathematical approaches on flow-level traffic data have been proposed due to the importance of analyzing the structure and situation of the network. Different from the state-of-the-art studies, we first propose a new decomposition model based on accelerated proximal gradient method for packet-level traffic data. In addition, we present the iterative scheme of the algorithm for network anomaly detection problem, which is termed as NAD-APG. Based on the approach, we carry out the intrusion detection for packet-level network traffic data no matter whether it is polluted by noise or not. Finally, we design a prototype system for network anomalies detection such as Probe and R2L attacks. The experiments have shown that our approach is effective in revealing the patterns of network traffic data and detecting attacks from large-scale network traffic. Moreover, the experiments have demonstrated the robustness of the algorithm as well even when the network traffic is polluted by the large volume anomalies and noise.


Sensors ◽  
2020 ◽  
Vol 20 (21) ◽  
pp. 6336 ◽  
Author(s):  
Mnahi Alqahtani ◽  
Hassan Mathkour ◽  
Mohamed Maher Ben Ismail

Nowadays, Internet of Things (IoT) technology has various network applications and has attracted the interest of many research and industrial communities. Particularly, the number of vulnerable or unprotected IoT devices has drastically increased, along with the amount of suspicious activity, such as IoT botnet and large-scale cyber-attacks. In order to address this security issue, researchers have deployed machine and deep learning methods to detect attacks targeting compromised IoT devices. Despite these efforts, developing an efficient and effective attack detection approach for resource-constrained IoT devices remains a challenging task for the security research community. In this paper, we propose an efficient and effective IoT botnet attack detection approach. The proposed approach relies on a Fisher-score-based feature selection method along with a genetic-based extreme gradient boosting (GXGBoost) model in order to determine the most relevant features and to detect IoT botnet attacks. The Fisher score is a representative filter-based feature selection method used to determine significant features and discard irrelevant features through the minimization of intra-class distance and the maximization of inter-class distance. On the other hand, GXGBoost is an optimal and effective model, used to classify the IoT botnet attacks. Several experiments were conducted on a public botnet dataset of IoT devices. The evaluation results obtained using holdout and 10-fold cross-validation techniques showed that the proposed approach had a high detection rate using only three out of the 115 data traffic features and improved the overall performance of the IoT botnet attack detection process.


Sign in / Sign up

Export Citation Format

Share Document