Native malware detection in smartphones with android OS using static analysis, feature selection and ensemble classifiers

Author(s):  
S. Morales-Ortega ◽  
P.J. Escamilla-Ambrosio ◽  
A. Rodriguez-Mota ◽  
L.D. Coronado-De-Alba
Author(s):  
Harsha A K

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.


2017 ◽  
Vol 11 (3) ◽  
pp. 15-28 ◽  
Author(s):  
Anjali Kumawat ◽  
Anil Kumar Sharma ◽  
Sunita Kumawat

Android based Smartphones are nowadays getting more popular. While using Smartphone, user is always concerned about security and malicious attacks, cryptographic vulnerability of the applications. With increase in the number of Android mobiles, Android malwares are also increasing very rapidly. So the authors have proposed the “Identification of cryptographic vulnerability and malware detection in Android” system. They have designed a user friendly android application, through which user and developer can easily test the application whether it is benign or vulnerable. The application will be tested firstly using static analysis and then the dynamic analysis will be carried out. The authors have implemented static and dynamic analysis of android application for vulnerable and malicious app detection. They have also created a web page. User can either use the application or the web page.


2019 ◽  
Vol 63 (8) ◽  
pp. 1125-1138
Author(s):  
Mahmood Yousefi-Azar ◽  
Len Hamey ◽  
Vijay Varadharajan ◽  
Shiping Chen

Abstract Malware detection based on static features and without code disassembling is a challenging path of research. Obfuscation makes the static analysis of malware even more challenging. This paper extends static malware detection beyond byte level $n$-grams and detecting important strings. We propose a model (Byte2vec) with the capabilities of both binary file feature representation and feature selection for malware detection. Byte2vec embeds the semantic similarity of byte level codes into a feature vector (byte vector) and also into a context vector. The learned feature vectors of Byte2vec, using skip-gram with negative-sampling topology, are combined with byte-level term-frequency (tf) for malware detection. We also show that the distance between a feature vector and its corresponding context vector provides a useful measure to rank features. The top ranked features are successfully used for malware detection. We show that this feature selection algorithm is an unsupervised version of mutual information (MI). We test the proposed scheme on four freely available Android malware datasets including one obfuscated malware dataset. The model is trained only on clean APKs. The results show that the model outperforms MI in a low-dimensional feature space and is competitive with MI and other state-of-the-art models in higher dimensions. In particular, our tests show very promising results on a wide range of obfuscated malware with a false negative rate of only 0.3% and a false positive rate of 2.0%. The detection results on obfuscated malware show the advantage of the unsupervised feature selection algorithm compared with the MI-based method.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Alireza Osareh ◽  
Bita Shadgar

The gene microarray analysis and classification have demonstrated an effective way for the effective diagnosis of diseases and cancers. However, it has been also revealed that the basic classification techniques have intrinsic drawbacks in achieving accurate gene classification and cancer diagnosis. On the other hand, classifier ensembles have received increasing attention in various applications. Here, we address the gene classification issue using RotBoost ensemble methodology. This method is a combination of Rotation Forest and AdaBoost techniques which in turn preserve both desirable features of an ensemble architecture, that is, accuracy and diversity. To select a concise subset of informative genes, 5 different feature selection algorithms are considered. To assess the efficiency of the RotBoost, other nonensemble/ensemble techniques including Decision Trees, Support Vector Machines, Rotation Forest, AdaBoost, and Bagging are also deployed. Experimental results have revealed that the combination of the fast correlation-based feature selection method with ICA-based RotBoost ensemble is highly effective for gene classification. In fact, the proposed method can create ensemble classifiers which outperform not only the classifiers produced by the conventional machine learning but also the classifiers generated by two widely used conventional ensemble learning methods, that is, Bagging and AdaBoost.


2021 ◽  
Vol 37 ◽  
pp. 301139
Author(s):  
Nitin Naik ◽  
Paul Jenkins ◽  
Nick Savage ◽  
Longzhi Yang ◽  
Tossapon Boongoen ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document