scholarly journals Real Network Traffic Collection and Deep Learning for Mobile App Identification

2020 ◽  
Vol 2020 ◽  
pp. 1-14 ◽  
Author(s):  
Xin Wang ◽  
Shuhui Chen ◽  
Jinshu Su

The proliferation of mobile devices over recent years has led to a dramatic increase in mobile traffic. Demand for enabling accurate mobile app identification is coming as it is an essential step to improve a multitude of network services: accounting, security monitoring, traffic forecasting, and quality-of-service. However, traditional traffic classification techniques do not work well for mobile traffic. Besides, multiple machine learning solutions developed in this field are severely restricted by their handcrafted features as well as unreliable datasets. In this paper, we propose a framework for real network traffic collection and labeling in a scalable way. A dedicated Android traffic capture tool is developed to build datasets with perfect ground truth. Using our established dataset, we make an empirical exploration on deep learning methods for the task of mobile app identification, which can automate the feature engineering process in an end-to-end fashion. We introduce three of the most representative deep learning models and design and evaluate our dedicated classifiers, namely, a SDAE, a 1D CNN, and a bidirectional LSTM network, respectively. In comparison with two other baseline solutions, our CNN and RNN models with raw traffic inputs are capable of achieving state-of-the-art results regardless of TLS encryption. Specifically, the 1D CNN classifier obtains the best performance with an accuracy of 91.8% and macroaverage F-measure of 90.1%. To further understand the trained model, sample-specific interpretations are performed, showing how it can automatically learn important and advanced features from the uppermost bytes of an app’s raw flows.

2021 ◽  
Vol 2 (2) ◽  
Author(s):  
Kate Highnam ◽  
Domenic Puzio ◽  
Song Luo ◽  
Nicholas R. Jennings

AbstractBotnets and malware continue to avoid detection by static rule engines when using domain generation algorithms (DGAs) for callouts to unique, dynamically generated web addresses. Common DGA detection techniques fail to reliably detect DGA variants that combine random dictionary words to create domain names that closely mirror legitimate domains. To combat this, we created a novel hybrid neural network, Bilbo the “bagging” model, that analyses domains and scores the likelihood they are generated by such algorithms and therefore are potentially malicious. Bilbo is the first parallel usage of a convolutional neural network (CNN) and a long short-term memory (LSTM) network for DGA detection. Our unique architecture is found to be the most consistent in performance in terms of AUC, $$F_1$$ F 1 score, and accuracy when generalising across different dictionary DGA classification tasks compared to current state-of-the-art deep learning architectures. We validate using reverse-engineered dictionary DGA domains and detail our real-time implementation strategy for scoring real-world network logs within a large enterprise. In 4 h of actual network traffic, the model discovered at least five potential command-and-control networks that commercial vendor tools did not flag.


Author(s):  
Yunxuan Li ◽  
Jian Lu ◽  
Lin Zhang ◽  
Yi Zhao

The Didi Dache app is China’s biggest taxi booking mobile app and is popular in cities. Unsurprisingly, short-term traffic demand forecasting is critical to enabling Didi Dache to maximize use by drivers and ensure that riders can always find a car whenever and wherever they may need a ride. In this paper, a short-term traffic demand forecasting model, Wave SVM, is proposed. It combines the complementary advantages of Daubechies5 wavelets analysis and least squares support vector machine (LS-SVM) models while it overcomes their respective shortcomings. This method includes four stages: in the first stage, original data are preprocessed; in the second stage, these data are decomposed into high-frequency and low-frequency series by wavelet; in the third stage, the prediction stage, the LS-SVM method is applied to train and predict the corresponding high-frequency and low-frequency series; in the last stage, the diverse predicted sequences are reconstructed by wavelet. The real taxi-hailing orders data are applied to evaluate the model’s performance and practicality, and the results are encouraging. The Wave SVM model, compared with the prediction error of state-of-the-art models, not only has the best prediction performance but also appears to be the most capable of capturing the nonstationary characteristics of the short-term traffic dynamic systems.


Stroke ◽  
2020 ◽  
Vol 51 (Suppl_1) ◽  
Author(s):  
Benjamin Zahneisen ◽  
Matus Straka ◽  
Shalini Bammer ◽  
Greg Albers ◽  
Roland Bammer

Introduction: Ruling out hemorrhage (stroke or traumatic) prior to administration of thrombolytics is critical for Code Strokes. A triage software that identifies hemorrhages on head CTs and alerts radiologists would help to streamline patient care and increase diagnostic confidence and patient safety. ML approach: We trained a deep convolutional network with a hybrid 3D/2D architecture on unenhanced head CTs of 805 patients. Our training dataset comprised 348 positive hemorrhage cases (IPH=245, SAH=67, Sub/Epi-dural=70, IVH=83) (128 female) and 457 normal controls (217 female). Lesion outlines were drawn by experts and stored as binary masks that were used as ground truth data during the training phase (random 80/20 train/test split). Diagnostic sensitivity and specificity were defined on a per patient study level, i.e. a single, binary decision for presence/absence of a hemorrhage on a patient’s CT scan. Final validation was performed in 380 patients (167 positive). Tool: The hemorrhage detection module was prototyped in Python/Keras. It runs on a local LINUX server (4 CPUs, no GPUs) and is embedded in a larger image processing platform dedicated to stroke. Results: Processing time for a standard whole brain CT study (3-5mm slices) was around 2min. Upon completion, an instant notification (by email and/or mobile app) was sent to users to alert them about the suspected presence of a hemorrhage. Relative to neuroradiologist gold standard reads the algorithm’s sensitivity and specificity is 90.4% and 92.5% (95% CI: 85%-94% for both). Detection of acute intracranial hemorrhage can be automatized by deploying deep learning. It yielded very high sensitivity/specificity when compared to gold standard reads by a neuroradiologist. Volumes as small as 0.5mL could be detected reliably in the test dataset. The software can be deployed in busy practices to prioritize worklists and alert health care professionals to speed up therapeutic decision processes and interventions.


Symmetry ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1453
Author(s):  
Renjian Lyu ◽  
Mingshu He ◽  
Yu Zhang ◽  
Lei Jin ◽  
Xinlei Wang

Deep learning has been applied in the field of network intrusion detection and has yielded good results. In malicious network traffic classification tasks, many studies have achieved good performance with respect to the accuracy and recall rate of classification through self-designed models. In deep learning, the design of the model architecture greatly influences the results. However, the design of the network model architecture usually requires substantial professional knowledge. At present, the focus of research in the field of traffic monitoring is often directed elsewhere. Therefore, in the classification task of the network intrusion detection field, there is much room for improvement in the design and optimization of the model architecture. A neural architecture search (NAS) can automatically search the architecture of the model under the premise of a given optimization goal. For this reason, we propose a model that can perform NAS in the field of network traffic classification and search for the optimal architecture suitable for traffic detection based on the network traffic dataset. Each layer of our depth model is constructed according to the principle of maximum coding rate attenuation, which has strong consistency and symmetry in structure. Compared with some manually designed network architectures, classification indicators, such as Top-1 accuracy and F1 score, are also greatly improved while ensuring the lightweight nature of the model. In addition, we introduce a surrogate model in the search task. Compared to using the traditional NAS model to search the network traffic classification model, our NAS model greatly improves the search efficiency under the premise of ensuring that the results are not substantially different. We also manually adjust some operations in the search space of the architecture search to find a set of model operations that are more suitable for traffic classification. Finally, we apply the searched model to other traffic datasets to verify the universality of the model. Compared with several common network models in the traffic field, the searched model (NAS-Net) performs better, and the classification effect is more accurate.


2020 ◽  
Vol 12 (1) ◽  
pp. 1-11
Author(s):  
Arivudainambi D. ◽  
Varun Kumar K.A. ◽  
Vinoth Kumar R. ◽  
Visu P.

Ransomware is a malware which affects the systems data with modern encryption techniques, and the data is recovered once a ransom amount is paid. In this research, the authors show how ransomware propagates and infects devices. Live traffic classifications of ransomware have been meticulously analyzed. Further, a novel method for the classification of ransomware traffic by using deep learning methods is presented. Based on classification, the detection of ransomware is approached with the characteristics of the network traffic and its communications. In more detail, the behavior of popular ransomware, Crypto Wall, is analyzed and based on this knowledge, a real-time ransomware live traffic classification model is proposed.


Sign in / Sign up

Export Citation Format

Share Document