scholarly journals Automatic Benchmark Generation Framework for Malware Detection

2018 ◽  
Vol 2018 ◽  
pp. 1-8 ◽  
Author(s):  
Guanghui Liang ◽  
Jianmin Pang ◽  
Zheng Shan ◽  
Runqing Yang ◽  
Yihang Chen

To address emerging security threats, various malware detection methods have been proposed every year. Therefore, a small but representative set of malware samples are usually needed for detection model, especially for machine-learning-based malware detection models. However, current manual selection of representative samples from large unknown file collection is labor intensive and not scalable. In this paper, we firstly propose a framework that can automatically generate a small data set for malware detection. With this framework, we extract behavior features from a large initial data set and then use a hierarchical clustering technique to identify different types of malware. An improved genetic algorithm based on roulette wheel sampling is implemented to generate final test data set. The final data set is only one-eighteenth the volume of the initial data set, and evaluations show that the data set selected by the proposed framework is much smaller than the original one but does not lose nearly any semantics.

Author(s):  
Félix Iglesias Vázquez ◽  
Robert Annessi ◽  
Tanja Zseby

Covert timing channels are security threats that have concerned the expert community from the beginnings of secure computer networks. In this paper we explore the nature of covert timing channels by studying the behavior of a selection of features used for their detection. Insights are obtained from experimental studies based on ten covert timing channels techniques published in the literature, which include popular and novel approaches. The study digs into the shapes of flows containing covert timing channels from a statistical perspective as well as using supervised and unsupervised machine learning algorithms. Our experiments reveal which features are recommended for building detection methods and draw meaningful representations to understand the problem space. Covert timing channels show high histogramdistance based outlierness, but insufficient to clearly discriminate them from normal traffic. On the other hand, traffic features do show dependencies that allow separating subspaces and facilitate the identification of covert timing channels. The conducted study shows the detection difficulties due to the high shape variability of normal traffic and suggests the implementation of semi-supervised techniques to develop accurate and reliable detectors.  


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Lin Liu ◽  
Wang Ren ◽  
Feng Xie ◽  
Shengwei Yi ◽  
Junkai Yi ◽  
...  

The malicious APK (Android Application Package) makers use some techniques such as code obfuscation and code encryption to avoid existing detection methods, which poses new challenges for accurate virus detection and makes it more and more difficult to detect the malicious code. A report indicates that a new malicious app for Android is created every 10 seconds. To combat this serious malware activity, a scalable malware detection approach is needed, which can effectively and efficiently identify the malware apps. Common static detection methods often rely on Hash matching and analysis of viruses, which cannot quickly detect new malicious Android applications and their variants. In this paper, a malicious Android application detection method is proposed, which is implemented by the deep network fusion model. The hybrid model only needs to use the sample training model to achieve high accuracy in the identification of the malicious applications, which is more suitable for the detection of the new malicious Android applications than the existing methods. This method extracts the static features in the core code of the Android application by decompiling APK files, then performs code vectorization processing, and uses the deep learning network for classification and discrimination. Our experiments with a data set containing 10,170 apps show that the decisions from the hybrid model can increase the malware detection rate significantly on a real device, which verifies the superiority of this method in the detection of malicious codes.


2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Xin Wang ◽  
Dafang Zhang ◽  
Xin Su ◽  
Wenjia Li

In recent years, Android malware has continued to grow at an alarming rate. More recent malicious apps’ employing highly sophisticated detection avoidance techniques makes the traditional machine learning based malware detection methods far less effective. More specifically, they cannot cope with various types of Android malware and have limitation in detection by utilizing a single classification algorithm. To address this limitation, we propose a novel approach in this paper that leverages parallel machine learning and information fusion techniques for better Android malware detection, which is named Mlifdect. To implement this approach, we first extract eight types of features from static analysis on Android apps and build two kinds of feature sets after feature selection. Then, a parallel machine learning detection model is developed for speeding up the process of classification. Finally, we investigate the probability analysis based and Dempster-Shafer theory based information fusion approaches which can effectively obtain the detection results. To validate our method, other state-of-the-art detection works are selected for comparison with real-world Android apps. The experimental results demonstrate that Mlifdect is capable of achieving higher detection accuracy as well as a remarkable run-time efficiency compared to the existing malware detection solutions.


MENDEL ◽  
2019 ◽  
Vol 25 (2) ◽  
pp. 1-10 ◽  
Author(s):  
Ivan Zelinka ◽  
Eslam Amer

Current commercial antivirus detection engines still rely on signature-based methods. However, with the huge increase in the number of new malware, current detection methods become not suitable. In this paper, we introduce a malware detection model based on ensemble learning. The model is trained using the minimum number of signification features that are extracted from the file header. Evaluations show that the ensemble models slightly outperform individual classification models. Experimental evaluations show that our model can predict unseen malware with an accuracy rate of 0.998 and with a false positive rate of 0.002. The paper also includes a comparison between the performance of the proposed model and with different machine learning techniques. We are emphasizing the use of machine learning based approaches to replace conventional signature-based methods.


Electronics ◽  
2021 ◽  
Vol 10 (23) ◽  
pp. 2948
Author(s):  
Corentin Rodrigo ◽  
Samuel Pierre ◽  
Ronald Beaubrun ◽  
Franjieh El Khoury

Android has become the leading operating system for mobile devices, and the most targeted one by malware. Therefore, many analysis methods have been proposed for detecting Android malware. However, few of them use proper datasets for evaluation. In this paper, we propose BrainShield, a hybrid malware detection model trained on the Omnidroid dataset to reduce attacks on Android devices. The latter is the most diversified dataset in terms of the number of different features, and contains the largest number of samples, 22,000 samples, for model evaluation in the Android malware detection field. BrainShield’s implementation is based on a client/server architecture and consists of three fully connected neural networks: (1) the first is used for static analysis and reaches an accuracy of 92.9% trained on 840 static features; (2) the second is a dynamic neural network that reaches an accuracy of 81.1% trained on 3722 dynamic features; and (3) the third neural network proposed is hybrid, reaching an accuracy of 91.1% trained on 7081 static and dynamic features. Simulation results show that BrainShield is able to improve the accuracy and the precision of well-known malware detection methods.


2020 ◽  
Vol 309 ◽  
pp. 02002 ◽  
Author(s):  
Zhigang Zhang ◽  
Chaowen Chang ◽  
Peisheng Han ◽  
Hongtao Zhang

Malware is one of the most serious network security threats. To detect unknown variants of malware, many researches have proposed various methods of malware detection based on machine learning in recent years. However, modern malware is often protected by software packers, obfuscation, and other technologies, which bring challenges to malware analysis and detection. In this paper, we propose a system call based malware detection technology. By comparing malware and benign software in a sandbox environment, a sensitive system call context is extracted based on information gain, which reduces obfuscation caused by a normal system call. By using the deep belief network, we train a malware detection model with sensitive system call context to improve the detection accuracy.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zixin Wang ◽  
Jing Gao ◽  
Qingcheng Zeng ◽  
Yuhui Sun

Due to the repeated bearing of mechanical operations and natural factors, the container will suffer various types of damage during use. Adopting effective container damage detection methods plays a vital role in prolonging the service life and using function. This paper proposes a multitype damage detection model for containers based on transfer learning and MobileNetV2. In addition, a data set containing nine typical types of container damage is established. To ensure the validity and practicability of the model, we conducted tests and verifications in the actual port environment. The results show that the model can identify multiple types of container damage. Compared with the existing models, the damage detection model proposed in this paper can ensure the identification effect of various types of container damage, which is more suitable for the actual container detection situation. This method can provide a new idea of damage detection for container management in ports.


2021 ◽  
Author(s):  
shouqiang Liu ◽  
Mingyue Jiang ◽  
Liming Chen ◽  
Yang Wang

Abstract Novel coronavirus pneumonia (COVID-19) is a highly infectious and fatal pneumonia-type disease that poses a great threat to the public safety of society. A fast and efficient method for screening COVID19-positive patients is essential. At present, the main detection methods are nucleic acid detection of manual diagnosis and medical imaging (CT image/X-ray image), both of which take a long time to obtain the diagnosis result. This paper discusses the common processing methods for the problem of insufficient medical image data. Then, transfer learning and convolutional neural network were used to construct the screening and diagnosis model of COVID-19, and different migration models were analyzed and compared to select a better pre-training model, which was trained and analyzed under small data sets. Finally, it analyzes and discusses how to train a highly reliable model to quickly help doctors provide advice in the critical moment of epidemic prevention and control when only a small sample data set is available.


2016 ◽  
Vol 15 ◽  
pp. 163-171
Author(s):  
M. G. Shcherbakovskiy

The article discusses the reasonsfor an expert to participate in legal proceedings. The gnoseological reason for that consists of the bad quality of materials subject to examination that renders the examination either completely impossible or compromises objective, reasoned and reliable assessment of the findings. The procedural reason consists ofa proscription for an expert to collect evidence himself or herself. The author investigates into the ways of how an expert can participate in legal proceedings. If the defense invites an expert to participate in the proceedings, then it is recommended that his or her involvement should be in the presence of attesting witnesses and recorded in the protocol. In the course of the legal proceedings an expert has the following tasks: adding initial data, acquiring new initial data, understanding the situation of the incident, acquiring new objects to be studied, including samples for examination. An expert’s participation in legal proceedings differs from the participation of a specialist or an examination on the scene of the incident. The author describes the tasks that an expert solves in the course of legal proceedings, the peculiarities ofan investigation experiment practices, the selection of samples for an examination, inspection, interrogation.


2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


Sign in / Sign up

Export Citation Format

Share Document