scholarly journals A Simhash-Based Integrative Features Extraction Algorithm for Malware Detection

Algorithms ◽  
2018 ◽  
Vol 11 (8) ◽  
pp. 124 ◽  
Author(s):  
Yihong Li ◽  
Fangzheng Liu ◽  
Zhenyu Du ◽  
Dubing Zhang

In the malware detection process, obfuscated malicious codes cannot be efficiently and accurately detected solely in the dynamic or static feature space. Aiming at this problem, an integrative feature extraction algorithm based on simhash was proposed, which combines the static information e.g., API (Application Programming Interface) calls and dynamic information (such as file, registry and network behaviors) of malicious samples to form integrative features. The experiment extracts the integrative features of some static information and dynamic information, and then compares the classification, time and obfuscated-detection performance of the static, dynamic and integrated features, respectively, by using several common machine learning algorithms. The results show that the integrative features have better time performance than the static features, and better classification performance than the dynamic features, and almost the same obfuscated-detection performance as the dynamic features. This algorithm can provide some support for feature extraction of malware detection.

2013 ◽  
Vol 347-350 ◽  
pp. 2241-2245
Author(s):  
Xiao Yuan Jing ◽  
Xiang Long Ge ◽  
Yong Fang Yao ◽  
Feng Nan Yu

When the number of labeled training samples is very small, the sample information people can use would be very little and the recognition rates of traditional image recognition methods are not satisfactory. However, there is often some related information contained in other databases that is helpful to feature extraction. Thus, it is considered to take full advantage of the data information in other databases by transfer learning. In this paper, the idea of transferring the samples is employed and further we propose a feature extraction approach based on sample set reconstruction. We realize the approach by reconstructing the training sample set using the difference information among the samples of other databases. Experimental results on three widely used face databases AR, FERET, CAS-PEAL are presented to demonstrate the efficacy of the proposed approach in classification performance.


2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
TaeGuen Kim ◽  
BooJoong Kang ◽  
Eul Gyu Im

As the number of Android malware has been increased rapidly over the years, various malware detection methods have been proposed so far. Existing methods can be classified into two categories: static analysis-based methods and dynamic analysis-based methods. Both approaches have some limitations: static analysis-based methods are relatively easy to be avoided through transformation techniques such as junk instruction insertions, code reordering, and so on. However, dynamic analysis-based methods also have some limitations that analysis overheads are relatively high and kernel modification might be required to extract dynamic features. In this paper, we propose a dynamic analysis framework for Android malware detection that overcomes the aforementioned shortcomings. The framework uses a suffix tree that contains API (Application Programming Interface) subtraces and their probabilistic confidence values that are generated using HMMs (Hidden Markov Model) to reduce the malware detection overhead, and we designed the framework with the client-server architecture since the suffix tree is infeasible to be deployed in mobile devices. In addition, an application rewriting technique is used to trace API invocations without any modifications in the Android kernel. In our experiments, we measured the detection accuracy and the computational overheads to evaluate its effectiveness and efficiency of the proposed framework.


PLoS ONE ◽  
2021 ◽  
Vol 16 (11) ◽  
pp. e0260232
Author(s):  
Emad T. Elkabbash ◽  
Reham R. Mostafa ◽  
Sherif I. Barakat

Smartphone usage is nearly ubiquitous worldwide, and Android provides the leading open-source operating system, retaining the most significant market share and active user population of all open-source operating systems. Hence, malicious actors target the Android operating system to capitalize on this consumer reliance and vulnerabilities present in the system. Hackers often use confidential user data to exploit users for advertising, extortion, and theft. Notably, most Android malware detection tools depend on conventional machine-learning algorithms; hence, they lose the benefits of metaheuristic optimization. Here, we introduce a novel detection system based on optimizing the random vector functional link (RVFL) using the artificial Jellyfish Search (JS) optimizer following dimensional reduction of Android application features. JS is used to determine the optimal configurations of RVFL to improve classification performance. RVFL+JS minimizes the runtime of the execution of the optimized models with the best performance metrics, based on a dataset consisting of 11,598 multi-class applications and 471 static and dynamic features.


2021 ◽  
Vol 11 (17) ◽  
pp. 7825
Author(s):  
Kunti Robiatul Mahmudah ◽  
Fatma Indriani ◽  
Yukiko Takemori-Sakai ◽  
Yasunori Iwata ◽  
Takashi Wada ◽  
...  

Typically, classification is conducted on a dataset that consists of numerical features and target classes. For instance, a grayscale image, which is usually represented as a matrix of integers varying from 0 to 255, enables one to apply various classification algorithms to image classification tasks. However, datasets represented as binary features cannot use many standard machine learning algorithms optimally, yet their amount is not negligible. On the other hand, oversampling algorithms such as synthetic minority oversampling technique (SMOTE) and its variants are often used if the dataset for classification is imbalanced. However, since SMOTE and its variants synthesize new minority samples based on the original samples, the diversity of the samples synthesized from binary features is highly limited due to the poor representation of original features. To solve this problem, a preprocessing approach is studied. By converting binary features into numerical ones using feature extraction methods, succeeding oversampling methods can fully display their potential in improving the classifiers’ performances. Through comprehensive experiments using benchmark datasets and real medical datasets, it was observed that a converted dataset consisting of numerical features is better for oversampling methods (maximum improvements of accuracy and F1-score were 35.11% and 42.17%, respectively). In addition, it is confirmed that feature extraction and oversampling synergistically contribute to the improvement of classification performance.


Electronics ◽  
2020 ◽  
Vol 9 (3) ◽  
pp. 435 ◽  
Author(s):  
Moutaz Alazab

Many Internet of Things (IoT) services are currently tracked and regulated via mobile devices, making them vulnerable to privacy attacks and exploitation by various malicious applications. Current solutions are unable to keep pace with the rapid growth of malware and are limited by low detection accuracy, long discovery time, complex implementation, and high computational costs associated with the processor speed, power, and memory. Therefore, an automated intelligence technique is necessary for detecting apps containing malware and effectively predicting cyberattacks in mobile marketplaces. In this study, a system for classifying mobile marketplaces applications using real-world datasets is proposed, which analyzes the source code to identify malicious apps. A rich feature set of application programming interface (API) calls is proposed to capture the regularities in apps containing malicious content. Two feature-selection methods—Chi-Square and ANOVA—were examined in conjunction with ten supervised machine-learning algorithms. The detection accuracy of each classifier was evaluated to identify the most reliable classifier for malware detection using various feature sets. Chi-Square was found to have a higher detection accuracy as compared to ANOVA. The proposed system achieved a detection accuracy of 98.1% with a classification time of 1.22 s. Furthermore, the proposed system required a reduced number of API calls (500 instead of 9000) to be incorporated as features.


2011 ◽  
Vol 33 (7) ◽  
pp. 1625-1631 ◽  
Author(s):  
Lin Lian ◽  
Guo-hui Li ◽  
Hai-tao Wang ◽  
hao Tian ◽  
Shu-kui Xu

Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


Sign in / Sign up

Export Citation Format

Share Document