Native malware detection in smartphones with android OS using static analysis, feature selection and ensemble classifiers

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Feature Selection and Software Defect Prediction by Different Ensemble Classifiers

10.1007/978-3-030-86472-9_28 ◽

2021 ◽

pp. 307-313

Author(s):

Natalya Shakhovska ◽

Vitaliy Yakovyna

Keyword(s):

Feature Selection ◽

Defect Prediction ◽

Software Defect Prediction ◽

Ensemble Classifiers ◽

Software Defect

Download Full-text

Identification of Cryptographic Vulnerability and Malware Detection in Android

International Journal of Information Security and Privacy ◽

10.4018/ijisp.2017070102 ◽

2017 ◽

Vol 11 (3) ◽

pp. 15-28 ◽

Cited By ~ 3

Author(s):

Anjali Kumawat ◽

Anil Kumar Sharma ◽

Sunita Kumawat

Keyword(s):

Dynamic Analysis ◽

Static Analysis ◽

Malware Detection ◽

Android Application ◽

Malicious Attacks ◽

Web Page ◽

Static And Dynamic Analysis ◽

Android System ◽

User Friendly ◽

The Web

Android based Smartphones are nowadays getting more popular. While using Smartphone, user is always concerned about security and malicious attacks, cryptographic vulnerability of the applications. With increase in the number of Android mobiles, Android malwares are also increasing very rapidly. So the authors have proposed the “Identification of cryptographic vulnerability and malware detection in Android” system. They have designed a user friendly android application, through which user and developer can easily test the application whether it is benign or vulnerable. The application will be tested firstly using static analysis and then the dynamic analysis will be carried out. The authors have implemented static and dynamic analysis of android application for vulnerable and malicious app detection. They have also created a web page. User can either use the application or the web page.

Download Full-text

Android Malware Detection Using Genetic Algorithm based Optimized Feature Selection and Machine Learning

2019 42nd International Conference on Telecommunications and Signal Processing (TSP) ◽

10.1109/tsp.2019.8769039 ◽

2019 ◽

Cited By ~ 2

Author(s):

Anam Fatima ◽

Ritesh Maurya ◽

Malay Kishore Dutta ◽

Radim Burget ◽

Jan Masek

Keyword(s):

Machine Learning ◽

Genetic Algorithm ◽

Feature Selection ◽

Malware Detection ◽

Android Malware ◽

Android Malware Detection

Download Full-text

Byte2vec: Malware Representation and Feature Selection for Android

The Computer Journal ◽

10.1093/comjnl/bxz121 ◽

2019 ◽

Vol 63 (8) ◽

pp. 1125-1138

Author(s):

Mahmood Yousefi-Azar ◽

Len Hamey ◽

Vijay Varadharajan ◽

Shiping Chen

Keyword(s):

Feature Selection ◽

Feature Vector ◽

False Negative ◽

Malware Detection ◽

False Negative Rate ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Context Vector ◽

Wide Range ◽

Selection For

Abstract Malware detection based on static features and without code disassembling is a challenging path of research. Obfuscation makes the static analysis of malware even more challenging. This paper extends static malware detection beyond byte level $n$-grams and detecting important strings. We propose a model (Byte2vec) with the capabilities of both binary file feature representation and feature selection for malware detection. Byte2vec embeds the semantic similarity of byte level codes into a feature vector (byte vector) and also into a context vector. The learned feature vectors of Byte2vec, using skip-gram with negative-sampling topology, are combined with byte-level term-frequency (tf) for malware detection. We also show that the distance between a feature vector and its corresponding context vector provides a useful measure to rank features. The top ranked features are successfully used for malware detection. We show that this feature selection algorithm is an unsupervised version of mutual information (MI). We test the proposed scheme on four freely available Android malware datasets including one obfuscated malware dataset. The model is trained only on clean APKs. The results show that the model outperforms MI in a low-dimensional feature space and is competitive with MI and other state-of-the-art models in higher dimensions. In particular, our tests show very promising results on a wide range of obfuscated malware with a false negative rate of only 0.3% and a false positive rate of 2.0%. The detection results on obfuscated malware show the advantage of the unsupervised feature selection algorithm compared with the MI-based method.

Download Full-text

An Efficient Ensemble Learning Method for Gene Microarray Classification

BioMed Research International ◽

10.1155/2013/478410 ◽

2013 ◽

Vol 2013 ◽

pp. 1-10 ◽

Cited By ~ 9

Author(s):

Alireza Osareh ◽

Bita Shadgar

Keyword(s):

Feature Selection ◽

Ensemble Learning ◽

Feature Selection Method ◽

Support Vector ◽

Gene Microarray ◽

Ensemble Classifiers ◽

Classifier Ensembles ◽

Rotation Forest ◽

Ensemble Techniques ◽

Effective Diagnosis

The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.

Download Full-text