A Feature Selection Method for Large-Scale Network Traffic Classification Based on Spark

Yong Wang; Wenlong Ke; Xiaoling Tao

doi:10.3390/info7010006

A Novel Feature Selection Method Based on Principal Component Analysis for Network Traffic Classification

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.989-994.4510 ◽

2014 ◽

Vol 989-994 ◽

pp. 4510-4513

Author(s):

Hong Zhi Wang ◽

Jian Ping Zhang ◽

Zun Yi Shang

Keyword(s):

Feature Selection ◽

Network Traffic ◽

Feature Selection Method ◽

Principal Component ◽

Selection Method ◽

Performance Comparison ◽

Traffic Classification ◽

Network Traffic Classification ◽

Validation Experiment ◽

Pca Method

In network traffic classification, by conventional PCA method, more features still exist due to uniform contribution rates for most of features. To overcome this problem, in this paper, a novel feature selection method is proposed to reduce data dimension of network traffic. A contribution rate of various features in each component is calculated by a new weight criterion. A maxima-order principle is proposed to determine feature selection. Based on three multi-class classification methods, performance comparison is conducted by actual traffic data with 10-fold cross-validation. Experiment shows that the proposed method has higher classification accuracy than conventional PCA method.

Download Full-text

An Improved Network Traffic Classification Model Based on a Support Vector Machine

Symmetry ◽

10.3390/sym12020301 ◽

2020 ◽

Vol 12 (2) ◽

pp. 301 ◽

Cited By ~ 1

Author(s):

Jie Cao ◽

Da Wang ◽

Zhaoyang Qu ◽

Hongyu Sun ◽

Bin Li ◽

...

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Network Traffic ◽

Feature Selection Method ◽

Selection Method ◽

Classification Model ◽

Support Vector ◽

Traffic Classification ◽

Generalization Ability ◽

Network Traffic Classification

Network traffic classification based on machine learning is an important branch of pattern recognition in computer science. It is a key technology for dynamic intelligent network management and enhanced network controllability. However, the traffic classification methods still facing severe challenges: The optimal set of features is difficult to determine. The classification method is highly dependent on the effective characteristic combination. Meanwhile, it is also important to balance the experience risk and generalization ability of the classifier. In this paper, an improved network traffic classification model based on a support vector machine is proposed. First, a filter-wrapper hybrid feature selection method is proposed to solve the false deletion of combined features caused by a traditional feature selection method. Second, to balance the empirical risk and generalization ability of support vector machine (SVM) traffic classification model, an improved parameter optimization algorithm is proposed. The algorithm can dynamically adjust the quadratic search area, reduce the density of quadratic mesh generation, improve the search efficiency of the algorithm, and prevent the over-fitting while optimizing the parameters. The experiments show that the improved traffic classification model achieves higher classification accuracy, lower dimension and shorter elapsed time and performs significantly better than traditional SVM and the other three typical supervised ML algorithms.

Download Full-text

Performance Evaluation of Hadoop-based Large-scale Network Traffic Analysis Cluster

MATEC Web of Conferences ◽

10.1051/matecconf/20165605015 ◽

2016 ◽

Vol 56 ◽

pp. 05015

Author(s):

Ran Tao ◽

Yuanyuan Qiao ◽

Wenli Zhou

Keyword(s):

Performance Evaluation ◽

Network Traffic ◽

Large Scale ◽

Traffic Analysis ◽

Network Traffic Analysis ◽

Large Scale Network ◽

Scale Network

Download Full-text

Research of the Method for Traffic Generating Based on Genetic Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.3 ◽

2014 ◽

Vol 701-702 ◽

pp. 3-7

Author(s):

Liu Bo

Keyword(s):

Genetic Algorithm ◽

Network Traffic ◽

Large Scale ◽

Network Reliability ◽

Actual Usage ◽

Large Scale Network ◽

Scale Network ◽

Self Similar ◽

Optimal Values

It has great impact on result of the network test or simulation if the test simulated traffic is corresponding to real situation. The network traffic is the superposition of different traffic streams in the actual usage of the network. But because of the complexity and time-consumption to generate different traffic streams, it is difficult to generate the network traffic in the simulation for the large scale network. This paper proposes a kind of method for traffic generating based on genetic algorithm .According to building the self-similar traffic model ,the optimal values of the model’s parameters has been obtained. A case study shows the effectiveness of the method for the network reliability.

Download Full-text

Network Traffic Classification Using Feature Selection and Parameter Optimization

Journal of Communications ◽

10.12720/jcm.10.10.828-835 ◽

2015 ◽

Cited By ~ 2

Author(s):

Jie Cao ◽

◽

Zhiyi Fang ◽

Dan Zhang ◽

Guannan Qu

Keyword(s):

Feature Selection ◽

Parameter Optimization ◽

Network Traffic ◽

Traffic Classification ◽

Network Traffic Classification

Download Full-text

A Large-Scale Network Data Analysis via Sparse and Low Rank Reconstruction

Discrete Dynamics in Nature and Society ◽

10.1155/2014/323764 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Liang Fu Lu ◽

Zheng-Hai Huang ◽

Mohammed A. Ambusaidi ◽

Kui-Xiang Gou

Keyword(s):

Network Traffic ◽

Large Scale ◽

Low Rank ◽

Prototype System ◽

Traffic Data ◽

Proximal Gradient Method ◽

Large Scale Network ◽

Network Operation ◽

Efficient Data ◽

Scale Network

With the rapid growth of data communications in size and complexity, the threat of malicious activities and computer crimes has increased accordingly as well. Thus, investigating efficient data processing techniques for network operation and management over large-scale network traffic is highly required. Some mathematical approaches on flow-level traffic data have been proposed due to the importance of analyzing the structure and situation of the network. Different from the state-of-the-art studies, we first propose a new decomposition model based on accelerated proximal gradient method for packet-level traffic data. In addition, we present the iterative scheme of the algorithm for network anomaly detection problem, which is termed as NAD-APG. Based on the approach, we carry out the intrusion detection for packet-level network traffic data no matter whether it is polluted by noise or not. Finally, we design a prototype system for network anomalies detection such as Probe and R2L attacks. The experiments have shown that our approach is effective in revealing the patterns of network traffic data and detecting attacks from large-scale network traffic. Moreover, the experiments have demonstrated the robustness of the algorithm as well even when the network traffic is polluted by the large volume anomalies and noise.

Download Full-text

IoT Botnet Attack Detection Based on Optimized Extreme Gradient Boosting and Feature Selection

Sensors ◽

10.3390/s20216336 ◽

2020 ◽

Vol 20 (21) ◽

pp. 6336 ◽

Cited By ~ 1

Author(s):

Mnahi Alqahtani ◽

Hassan Mathkour ◽

Mohamed Maher Ben Ismail

Keyword(s):

Feature Selection ◽

Large Scale ◽

Feature Selection Method ◽

Selection Method ◽

Attack Detection ◽

Gradient Boosting ◽

Fisher Score ◽

Detection Approach ◽

Extreme Gradient Boosting ◽

Iot Devices

Nowadays, Internet of Things (IoT) technology has various network applications and has attracted the interest of many research and industrial communities. Particularly, the number of vulnerable or unprotected IoT devices has drastically increased, along with the amount of suspicious activity, such as IoT botnet and large-scale cyber-attacks. In order to address this security issue, researchers have deployed machine and deep learning methods to detect attacks targeting compromised IoT devices. Despite these efforts, developing an efficient and effective attack detection approach for resource-constrained IoT devices remains a challenging task for the security research community. In this paper, we propose an efficient and effective IoT botnet attack detection approach. The proposed approach relies on a Fisher-score-based feature selection method along with a genetic-based extreme gradient boosting (GXGBoost) model in order to determine the most relevant features and to detect IoT botnet attacks. The Fisher score is a representative filter-based feature selection method used to determine significant features and discard irrelevant features through the minimization of intra-class distance and the maximization of inter-class distance. On the other hand, GXGBoost is an optimal and effective model, used to classify the IoT botnet attacks. Several experiments were conducted on a public botnet dataset of IoT devices. The evaluation results obtained using holdout and 10-fold cross-validation techniques showed that the proposed approach had a high detection rate using only three out of the 115 data traffic features and improved the overall performance of the IoT botnet attack detection process.

Download Full-text

Intelligent generation of fuzzy rules for network firewalls based on the analysis of large-scale network traffic dumps

International Journal of Hybrid Intelligent Systems ◽

10.3233/his-170236 ◽

2017 ◽

Vol 13 (3-4) ◽

pp. 195-206

Author(s):

Andrii Shalaginov ◽

Katrin Franke

Keyword(s):

Network Traffic ◽

Large Scale ◽

Fuzzy Rules ◽

Large Scale Network ◽

Scale Network

Download Full-text

A systematic approach of feature selection for encrypted network traffic classification

2018 Annual IEEE International Systems Conference (SysCon) ◽

10.1109/syscon.2018.8369567 ◽

2018 ◽

Cited By ~ 4

Author(s):

Donald McGaughey ◽

Trevor Semeniuk ◽

Ron Smith ◽

Scott Knight

Keyword(s):

Feature Selection ◽

Network Traffic ◽

Systematic Approach ◽

Traffic Classification ◽

Network Traffic Classification ◽

Selection For

Download Full-text

An efficient feature selection method for network video traffic classification

2017 IEEE 17th International Conference on Communication Technology (ICCT) ◽

10.1109/icct.2017.8359902 ◽

2017 ◽

Cited By ~ 1

Author(s):

Yuning Dong ◽

Quantao Yue ◽

Mao Feng

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Traffic Classification ◽

Video Traffic

Download Full-text