Learning-Based Detection for Malicious Android Application Using Code Vectorization

Security and Communication Networks ◽

10.1155/2021/9964224 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Lin Liu ◽

Wang Ren ◽

Feng Xie ◽

Shengwei Yi ◽

Junkai Yi ◽

...

Keyword(s):

Hybrid Model ◽

Malware Detection ◽

Virus Detection ◽

Training Model ◽

Malicious Code ◽

Detection Methods ◽

Android Application ◽

Fusion Model ◽

Data Set ◽

Android Applications

The malicious APK (Android Application Package) makers use some techniques such as code obfuscation and code encryption to avoid existing detection methods, which poses new challenges for accurate virus detection and makes it more and more difficult to detect the malicious code. A report indicates that a new malicious app for Android is created every 10 seconds. To combat this serious malware activity, a scalable malware detection approach is needed, which can effectively and efficiently identify the malware apps. Common static detection methods often rely on Hash matching and analysis of viruses, which cannot quickly detect new malicious Android applications and their variants. In this paper, a malicious Android application detection method is proposed, which is implemented by the deep network fusion model. The hybrid model only needs to use the sample training model to achieve high accuracy in the identification of the malicious applications, which is more suitable for the detection of the new malicious Android applications than the existing methods. This method extracts the static features in the core code of the Android application by decompiling APK files, then performs code vectorization processing, and uses the deep learning network for classification and discrimination. Our experiments with a data set containing 10,170 apps show that the decisions from the hybrid model can increase the malware detection rate significantly on a real device, which verifies the superiority of this method in the detection of malicious codes.

Download Full-text

Automatic Benchmark Generation Framework for Malware Detection

Security and Communication Networks ◽

10.1155/2018/4947695 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 2

Author(s):

Guanghui Liang ◽

Jianmin Pang ◽

Zheng Shan ◽

Runqing Yang ◽

Yihang Chen

Keyword(s):

Initial Data ◽

Malware Detection ◽

Detection Methods ◽

Small Data ◽

Security Threats ◽

Improved Genetic Algorithm ◽

Data Set ◽

Detection Model ◽

Manual Selection ◽

Selection Of

To address emerging security threats, various malware detection methods have been proposed every year. Therefore, a small but representative set of malware samples are usually needed for detection model, especially for machine-learning-based malware detection models. However, current manual selection of representative samples from large unknown file collection is labor intensive and not scalable. In this paper, we firstly propose a framework that can automatically generate a small data set for malware detection. With this framework, we extract behavior features from a large initial data set and then use a hierarchical clustering technique to identify different types of malware. An improved genetic algorithm based on roulette wheel sampling is implemented to generate final test data set. The final data set is only one-eighteenth the volume of the initial data set, and evaluations show that the data set selected by the proposed framework is much smaller than the original one but does not lose nearly any semantics.

Download Full-text

Towards Accurate Run-Time Hardware-Assisted Stealthy Malware Detection: A Lightweight, Yet Effective Time Series CNN-Based Approach

Cryptography ◽

10.3390/cryptography5040028 ◽

2021 ◽

Vol 5 (4) ◽

pp. 28

Author(s):

Hossein Sayadi ◽

Yifeng Gao ◽

Hosein Mohammadi Makrani ◽

Jessica Lin ◽

Paulo Cesar Costa ◽

...

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

State Of The Art ◽

Malware Detection ◽

Detection Performance ◽

Malicious Code ◽

Detection Methods ◽

Series Data ◽

Run Time

According to recent security analysis reports, malicious software (a.k.a. malware) is rising at an alarming rate in numbers, complexity, and harmful purposes to compromise the security of modern computer systems. Recently, malware detection based on low-level hardware features (e.g., Hardware Performance Counters (HPCs) information) has emerged as an effective alternative solution to address the complexity and performance overheads of traditional software-based detection methods. Hardware-assisted Malware Detection (HMD) techniques depend on standard Machine Learning (ML) classifiers to detect signatures of malicious applications by monitoring built-in HPC registers during execution at run-time. Prior HMD methods though effective have limited their study on detecting malicious applications that are spawned as a separate thread during application execution, hence detecting stealthy malware patterns at run-time remains a critical challenge. Stealthy malware refers to harmful cyber attacks in which malicious code is hidden within benign applications and remains undetected by traditional malware detection approaches. In this paper, we first present a comprehensive review of recent advances in hardware-assisted malware detection studies that have used standard ML techniques to detect the malware signatures. Next, to address the challenge of stealthy malware detection at the processor’s hardware level, we propose StealthMiner, a novel specialized time series machine learning-based approach to accurately detect stealthy malware trace at run-time using branch instructions, the most prominent HPC feature. StealthMiner is based on a lightweight time series Fully Convolutional Neural Network (FCN) model that automatically identifies potentially contaminated samples in HPC-based time series data and utilizes them to accurately recognize the trace of stealthy malware. Our analysis demonstrates that using state-of-the-art ML-based malware detection methods is not effective in detecting stealthy malware samples since the captured HPC data not only represents malware but also carries benign applications’ microarchitectural data. The experimental results demonstrate that with the aid of our novel intelligent approach, stealthy malware can be detected at run-time with 94% detection performance on average with only one HPC feature, outperforming the detection performance of state-of-the-art HMD and general time series classification methods by up to 42% and 36%, respectively.

Download Full-text

Android Malware Detection Using Fine-Grained Features

Scientific Programming ◽

10.1155/2020/5190138 ◽

2020 ◽

Vol 2020 ◽

pp. 1-13

Author(s):

Xu Jiang ◽

Baolei Mao ◽

Jun Guan ◽

Xingli Huang

Keyword(s):

Malware Detection ◽

Detection Methods ◽

Practical Implementation ◽

Security Threat ◽

Android Malware ◽

Fine Grained ◽

Android Malware Detection ◽

Android Applications ◽

The Difference ◽

First Time

Nowadays, Android applications declare as many permissions as possible to provide more function for the users, which also poses severe security threat to them. Although many Android malware detection methods based on permissions have been developed, they are ineffective when malicious applications declare few dangerous permissions or when the dangerous permissions declared by malicious applications are similar with those declared by benign applications. This limitation is attributed to the use of too few information for classification. We propose a new method named fine-grained dangerous permission (FDP) method for detecting Android malicious applications, which gathers features that better represent the difference between malicious applications and benign applications. Among these features, the fine-grained feature of dangerous permissions applied in components is proposed for the first time. We evaluate 1700 benign applications and 1600 malicious applications and demonstrate that FDP achieves a TP rate of 94.5%. Furthermore, compared with other related detection approaches, FDP can detect more malware families and only requires 15.205 s to analyze one application on average, which demonstrates its applicability for practical implementation.

Download Full-text

Application Research on the Medical imaging testing of Novel Coronavirus Pneumonia COVID-19 based on Transfer Learning

10.21203/rs.3.rs-771648/v1 ◽

2021 ◽

Author(s):

shouqiang Liu ◽

Mingyue Jiang ◽

Liming Chen ◽

Yang Wang

Keyword(s):

Medical Imaging ◽

Transfer Learning ◽

Image Data ◽

Training Model ◽

Small Sample ◽

Detection Methods ◽

Nucleic Acid Detection ◽

Small Data ◽

Data Set ◽

Novel Coronavirus

Abstract Novel coronavirus pneumonia (COVID-19) is a highly infectious and fatal pneumonia-type disease that poses a great threat to the public safety of society. A fast and efficient method for screening COVID19-positive patients is essential. At present, the main detection methods are nucleic acid detection of manual diagnosis and medical imaging (CT image/X-ray image), both of which take a long time to obtain the diagnosis result. This paper discusses the common processing methods for the problem of insufficient medical image data. Then, transfer learning and convolutional neural network were used to construct the screening and diagnosis model of COVID-19, and different migration models were analyzed and compared to select a better pre-training model, which was trained and analyzed under small data sets. Finally, it analyzes and discusses how to train a highly reliable model to quickly help doctors provide advice in the critical moment of epidemic prevention and control when only a small sample data set is available.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text

Malware Detection Based on Code Visualization and Two-Level Classification

Information ◽

10.3390/info12030118 ◽

2021 ◽

Vol 12 (3) ◽

pp. 118

Author(s):

Vassilios Moussas ◽

Antonios Andreatos

Keyword(s):

Web Applications ◽

Detection System ◽

Malware Detection ◽

Image Features ◽

Malicious Code ◽

Malware Analysis ◽

Malicious Software ◽

Antivirus Software ◽

Malware Classification ◽

Artificial Neural Network Ann

Malware creators generate new malicious software samples by making minor changes in previously generated code, in order to reuse malicious code, as well as to go unnoticed from signature-based antivirus software. As a result, various families of variations of the same initial code exist today. Visualization of compiled executables for malware analysis has been proposed several years ago. Visualization can greatly assist malware classification and requires neither disassembly nor code execution. Moreover, new variations of known malware families are instantly detected, in contrast to traditional signature-based antivirus software. This paper addresses the problem of identifying variations of existing malware visualized as images. A new malware detection system based on a two-level Artificial Neural Network (ANN) is proposed. The classification is based on file and image features. The proposed system is tested on the ‘Malimg’ dataset consisting of the visual representation of well-known malware families. From this set some important image features are extracted. Based on these features, the ANN is trained. Then, this ANN is used to detect and classify other samples of the dataset. Malware families creating a confusion are classified by a second level of ANNs. The proposed two-level ANN method excels in simplicity, accuracy, and speed; it is easy to implement and fast to run, thus it can be applied to antivirus software, smart firewalls, web applications, etc.

Download Full-text

Classification of jujube defects in small data sets based on transfer learning

Neural Computing and Applications ◽

10.1007/s00521-021-05715-2 ◽

2021 ◽

Author(s):

Jianping Ju ◽

Hong Zheng ◽

Xiaohang Xu ◽

Zhongyuan Guo ◽

Zhaohui Zheng ◽

...

Keyword(s):

Transfer Learning ◽

Loss Function ◽

Training Model ◽

Parameter Distribution ◽

Test Accuracy ◽

Small Data ◽

Data Sets ◽

Data Set ◽

Small Data Sets

AbstractAlthough convolutional neural networks have achieved success in the field of image classification, there are still challenges in the field of agricultural product quality sorting such as machine vision-based jujube defects detection. The performance of jujube defect detection mainly depends on the feature extraction and the classifier used. Due to the diversity of the jujube materials and the variability of the testing environment, the traditional method of manually extracting the features often fails to meet the requirements of practical application. In this paper, a jujube sorting model in small data sets based on convolutional neural network and transfer learning is proposed to meet the actual demand of jujube defects detection. Firstly, the original images collected from the actual jujube sorting production line were pre-processed, and the data were augmented to establish a data set of five categories of jujube defects. The original CNN model is then improved by embedding the SE module and using the triplet loss function and the center loss function to replace the softmax loss function. Finally, the depth pre-training model on the ImageNet image data set was used to conduct training on the jujube defects data set, so that the parameters of the pre-training model could fit the parameter distribution of the jujube defects image, and the parameter distribution was transferred to the jujube defects data set to complete the transfer of the model and realize the detection and classification of the jujube defects. The classification results are visualized by heatmap through the analysis of classification accuracy and confusion matrix compared with the comparison models. The experimental results show that the SE-ResNet50-CL model optimizes the fine-grained classification problem of jujube defect recognition, and the test accuracy reaches 94.15%. The model has good stability and high recognition accuracy in complex environments.

Download Full-text

Viruses and Type 1 Diabetes: From Enteroviruses to the Virome

Microorganisms ◽

10.3390/microorganisms9071519 ◽

2021 ◽

Vol 9 (7) ◽

pp. 1519

Author(s):

Sonia R. Isaacs ◽

Dylan B. Foskett ◽

Anna J. Maxwell ◽

Emily J. Ward ◽

Clare L. Faulkner ◽

...

Keyword(s):

Type 1 Diabetes ◽

Viral Infections ◽

Antiviral Drug ◽

Virus Detection ◽

Detection Methods ◽

Islet Autoimmunity ◽

Comprehensive Overview ◽

Drug Candidates

For over a century, viruses have left a long trail of evidence implicating them as frequent suspects in the development of type 1 diabetes. Through vigorous interrogation of viral infections in individuals with islet autoimmunity and type 1 diabetes using serological and molecular virus detection methods, as well as mechanistic studies of virus-infected human pancreatic β-cells, the prime suspects have been narrowed down to predominantly human enteroviruses. Here, we provide a comprehensive overview of evidence supporting the hypothesised role of enteroviruses in the development of islet autoimmunity and type 1 diabetes. We also discuss concerns over the historical focus and investigation bias toward enteroviruses and summarise current unbiased efforts aimed at characterising the complete population of viruses (the “virome”) contributing early in life to the development of islet autoimmunity and type 1 diabetes. Finally, we review the range of vaccine and antiviral drug candidates currently being evaluated in clinical trials for the prevention and potential treatment of type 1 diabetes.

Download Full-text

An Optimized Stacking Ensemble Model for Phishing Websites Detection

Electronics ◽

10.3390/electronics10111285 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1285

Author(s):

Mohammed Al-Sarem ◽

Faisal Saeed ◽

Zeyad Ghaleb Al-Mekhlafi ◽

Badiea Abdulkarem Mohammed ◽

Tawfik Al-Hadhrami ◽

...

Keyword(s):

Machine Learning ◽

Random Forests ◽

Ensemble Method ◽

Detection Methods ◽

Detection Accuracy ◽

Ensemble Model ◽

Security Attacks ◽

Data Set ◽

Machine Learning Methods ◽

Ensemble Machine Learning

Security attacks on legitimate websites to steal users’ information, known as phishing attacks, have been increasing. This kind of attack does not just affect individuals’ or organisations’ websites. Although several detection methods for phishing websites have been proposed using machine learning, deep learning, and other approaches, their detection accuracy still needs to be enhanced. This paper proposes an optimized stacking ensemble method for phishing website detection. The optimisation was carried out using a genetic algorithm (GA) to tune the parameters of several ensemble machine learning methods, including random forests, AdaBoost, XGBoost, Bagging, GradientBoost, and LightGBM. The optimized classifiers were then ranked, and the best three models were chosen as base classifiers of a stacking ensemble method. The experiments were conducted on three phishing website datasets that consisted of both phishing websites and legitimate websites—the Phishing Websites Data Set from UCI (Dataset 1); Phishing Dataset for Machine Learning from Mendeley (Dataset 2, and Datasets for Phishing Websites Detection from Mendeley (Dataset 3). The experimental results showed an improvement using the optimized stacking ensemble method, where the detection accuracy reached 97.16%, 98.58%, and 97.39% for Dataset 1, Dataset 2, and Dataset 3, respectively.

Download Full-text

Embed2Detect: temporally clustered embedded words for event detection in social media

Machine Learning ◽

10.1007/s10994-021-05988-7 ◽

2021 ◽

Author(s):

Hansi Hettiarachchi ◽

Mariam Adedoyin-Olowe ◽

Jagdev Bhogal ◽

Mohamed Medhat Gaber

Keyword(s):

Social Media ◽

Event Detection ◽

High Volume ◽

Detection Methods ◽

Word Embeddings ◽

Agglomerative Clustering ◽

Data Set ◽

Social Media Data ◽

Social Media Platforms ◽

Media Data

AbstractSocial media is becoming a primary medium to discuss what is happening around the world. Therefore, the data generated by social media platforms contain rich information which describes the ongoing events. Further, the timeliness associated with these data is capable of facilitating immediate insights. However, considering the dynamic nature and high volume of data production in social media data streams, it is impractical to filter the events manually and therefore, automated event detection mechanisms are invaluable to the community. Apart from a few notable exceptions, most previous research on automated event detection have focused only on statistical and syntactical features in data and lacked the involvement of underlying semantics which are important for effective information retrieval from text since they represent the connections between words and their meanings. In this paper, we propose a novel method termed Embed2Detect for event detection in social media by combining the characteristics in word embeddings and hierarchical agglomerative clustering. The adoption of word embeddings gives Embed2Detect the capability to incorporate powerful semantical features into event detection and overcome a major limitation inherent in previous approaches. We experimented our method on two recent real social media data sets which represent the sports and political domain and also compared the results to several state-of-the-art methods. The obtained results show that Embed2Detect is capable of effective and efficient event detection and it outperforms the recent event detection methods. For the sports data set, Embed2Detect achieved 27% higher F-measure than the best-performed baseline and for the political data set, it was an increase of 29%.

Download Full-text