A Simhash-Based Integrative Features Extraction Algorithm for Malware Detection

Yihong Li; Fangzheng Liu; Zhenyu Du; Dubing Zhang

doi:10.3390/a11080124

A Simhash-Based Integrative Features Extraction Algorithm for Malware Detection

Algorithms ◽

10.3390/a11080124 ◽

2018 ◽

Vol 11 (8) ◽

pp. 124 ◽

Cited By ~ 1

Author(s):

Yihong Li ◽

Fangzheng Liu ◽

Zhenyu Du ◽

Dubing Zhang

Keyword(s):

Feature Extraction ◽

Malware Detection ◽

Application Programming Interface ◽

Classification Performance ◽

Detection Performance ◽

Machine Learning Algorithms ◽

Dynamic Features ◽

Dynamic Information ◽

Static Information ◽

Extraction Algorithm

In the malware detection process, obfuscated malicious codes cannot be efficiently and accurately detected solely in the dynamic or static feature space. Aiming at this problem, an integrative feature extraction algorithm based on simhash was proposed, which combines the static information e.g., API (Application Programming Interface) calls and dynamic information (such as file, registry and network behaviors) of malicious samples to form integrative features. The experiment extracts the integrative features of some static information and dynamic information, and then compares the classification, time and obfuscated-detection performance of the static, dynamic and integrated features, respectively, by using several common machine learning algorithms. The results show that the integrative features have better time performance than the static features, and better classification performance than the dynamic features, and almost the same obfuscated-detection performance as the dynamic features. This algorithm can provide some support for feature extraction of malware detection.

Download Full-text

Feature Extraction Algorithm Based on Sample Set Reconstruction

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.347-350.2241 ◽

2013 ◽

Vol 347-350 ◽

pp. 2241-2245

Author(s):

Xiao Yuan Jing ◽

Xiang Long Ge ◽

Yong Fang Yao ◽

Feng Nan Yu

Keyword(s):

Feature Extraction ◽

Image Recognition ◽

Training Sample ◽

Classification Performance ◽

Related Information ◽

Training Samples ◽

Extraction Algorithm ◽

The Difference ◽

Sample Set ◽

Traditional Image

When the number of labeled training samples is very small, the sample information people can use would be very little and the recognition rates of traditional image recognition methods are not satisfactory. However, there is often some related information contained in other databases that is helpful to feature extraction. Thus, it is considered to take full advantage of the data information in other databases by transfer learning. In this paper, the idea of transferring the samples is employed and further we propose a feature extraction approach based on sample set reconstruction. We realize the approach by reconstructing the training sample set using the difference information among the samples of other databases. Experimental results on three widely used face databases AR, FERET, CAS-PEAL are presented to demonstrate the efficacy of the proposed approach in classification performance.

Download Full-text

Runtime Detection Framework for Android Malware

Mobile Information Systems ◽

10.1155/2018/8094314 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

TaeGuen Kim ◽

BooJoong Kang ◽

Eul Gyu Im

Keyword(s):

Dynamic Analysis ◽

Static Analysis ◽

Suffix Tree ◽

Malware Detection ◽

Application Programming Interface ◽

Detection Methods ◽

Detection Accuracy ◽

Dynamic Features ◽

Android Malware ◽

Android Malware Detection

As the number of Android malware has been increased rapidly over the years, various malware detection methods have been proposed so far. Existing methods can be classified into two categories: static analysis-based methods and dynamic analysis-based methods. Both approaches have some limitations: static analysis-based methods are relatively easy to be avoided through transformation techniques such as junk instruction insertions, code reordering, and so on. However, dynamic analysis-based methods also have some limitations that analysis overheads are relatively high and kernel modification might be required to extract dynamic features. In this paper, we propose a dynamic analysis framework for Android malware detection that overcomes the aforementioned shortcomings. The framework uses a suffix tree that contains API (Application Programming Interface) subtraces and their probabilistic confidence values that are generated using HMMs (Hidden Markov Model) to reduce the malware detection overhead, and we designed the framework with the client-server architecture since the suffix tree is infeasible to be deployed in mobile devices. In addition, an application rewriting technique is used to trace API invocations without any modifications in the Android kernel. In our experiments, we measured the detection accuracy and the computational overheads to evaluate its effectiveness and efficiency of the proposed framework.

Download Full-text

Android malware classification based on random vector functional link and artificial Jellyfish Search optimizer

PLoS ONE ◽

10.1371/journal.pone.0260232 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0260232

Author(s):

Emad T. Elkabbash ◽

Reham R. Mostafa ◽

Sherif I. Barakat

Keyword(s):

Operating System ◽

Open Source ◽

Random Vector ◽

Performance Metrics ◽

Detection System ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Dynamic Features ◽

Functional Link ◽

Android Malware

Smartphone usage is nearly ubiquitous worldwide, and Android provides the leading open-source operating system, retaining the most significant market share and active user population of all open-source operating systems. Hence, malicious actors target the Android operating system to capitalize on this consumer reliance and vulnerabilities present in the system. Hackers often use confidential user data to exploit users for advertising, extortion, and theft. Notably, most Android malware detection tools depend on conventional machine-learning algorithms; hence, they lose the benefits of metaheuristic optimization. Here, we introduce a novel detection system based on optimizing the random vector functional link (RVFL) using the artificial Jellyfish Search (JS) optimizer following dimensional reduction of Android application features. JS is used to determine the optimal configurations of RVFL to improve classification performance. RVFL+JS minimizes the runtime of the execution of the optimized models with the best performance metrics, based on a dataset consisting of 11,598 multi-class applications and 471 static and dynamic features.

Download Full-text

Classification of Imbalanced Data Represented as Binary Features

Applied Sciences ◽

10.3390/app11177825 ◽

2021 ◽

Vol 11 (17) ◽

pp. 7825

Author(s):

Kunti Robiatul Mahmudah ◽

Fatma Indriani ◽

Yukiko Takemori-Sakai ◽

Yasunori Iwata ◽

Takashi Wada ◽

...

Keyword(s):

Feature Extraction ◽

Imbalanced Data ◽

Extraction Methods ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Benchmark Datasets ◽

Binary Features ◽

Classification Tasks

Typically, classification is conducted on a dataset that consists of numerical features and target classes. For instance, a grayscale image, which is usually represented as a matrix of integers varying from 0 to 255, enables one to apply various classification algorithms to image classification tasks. However, datasets represented as binary features cannot use many standard machine learning algorithms optimally, yet their amount is not negligible. On the other hand, oversampling algorithms such as synthetic minority oversampling technique (SMOTE) and its variants are often used if the dataset for classification is imbalanced. However, since SMOTE and its variants synthesize new minority samples based on the original samples, the diversity of the samples synthesized from binary features is highly limited due to the poor representation of original features. To solve this problem, a preprocessing approach is studied. By converting binary features into numerical ones using feature extraction methods, succeeding oversampling methods can fully display their potential in improving the classifiers’ performances. Through comprehensive experiments using benchmark datasets and real medical datasets, it was observed that a converted dataset consisting of numerical features is better for oversampling methods (maximum improvements of accuracy and F1-score were 35.11% and 42.17%, respectively). In addition, it is confirmed that feature extraction and oversampling synergistically contribute to the improvement of classification performance.

Download Full-text

Automated Malware Detection in Mobile App Stores Based on Robust Feature Generation

Electronics ◽

10.3390/electronics9030435 ◽

2020 ◽

Vol 9 (3) ◽

pp. 435 ◽

Cited By ~ 3

Author(s):

Moutaz Alazab

Keyword(s):

Malware Detection ◽

Application Programming Interface ◽

Mobile App ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Detection Accuracy ◽

Chi Square ◽

Real World Datasets ◽

Malicious Apps ◽

Processor Speed

Many Internet of Things (IoT) services are currently tracked and regulated via mobile devices, making them vulnerable to privacy attacks and exploitation by various malicious applications. Current solutions are unable to keep pace with the rapid growth of malware and are limited by low detection accuracy, long discovery time, complex implementation, and high computational costs associated with the processor speed, power, and memory. Therefore, an automated intelligence technique is necessary for detecting apps containing malware and effectively predicting cyberattacks in mobile marketplaces. In this study, a system for classifying mobile marketplaces applications using real-world datasets is proposed, which analyzes the source code to identify malicious apps. A rich feature set of application programming interface (API) calls is proposed to capture the regularities in apps containing malicious content. Two feature-selection methods—Chi-Square and ANOVA—were examined in conjunction with ten supervised machine-learning algorithms. The detection accuracy of each classifier was evaluated to identify the most reliable classifier for malware detection using various feature sets. Chi-Square was found to have a higher detection accuracy as compared to ANOVA. The proposed system achieved a detection accuracy of 98.1% with a classification time of 1.22 s. Furthermore, the proposed system required a reduced number of API calls (500 instead of 9000) to be incorporated as features.

Download Full-text

Iris feature extraction algorithm based on odd symmetry 2D Log-Gabor

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00976 ◽

2009 ◽

Vol 29 (4) ◽

pp. 976-978 ◽

Cited By ~ 1

Author(s):

Lin-tao Lü ◽

Tao YANG

Keyword(s):

Feature Extraction ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Iris Feature Extraction ◽

Log Gabor

Download Full-text

Corresponding Feature Extraction Algorithm between Infrared and Visible Images Using MSER

JOURNAL OF ELECTRONICS INFORMATION TECHNOLOGY ◽

10.3724/sp.j.1146.2010.01111 ◽

2011 ◽

Vol 33 (7) ◽

pp. 1625-1631 ◽

Cited By ~ 1

Author(s):

Lin Lian ◽

Guo-hui Li ◽

Hai-tao Wang ◽

hao Tian ◽

Shu-kui Xu

Keyword(s):

Feature Extraction ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Visible Images

Download Full-text

Document Preprocessing with TF-IDF to Improve the Polarity Classification Performance of Unstructured Sentiment Analysis

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i3.1066 ◽

2020 ◽

pp. 235-242

Author(s):

Farrikh Alzami ◽

Erika Devi Udayanti ◽

Dwi Puji Prabowo ◽

Rama Aria Megantara

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Sentiment Analysis ◽

Classification Performance ◽

Document Preparation ◽

Learning Models ◽

Polarity Classification ◽

Negative Sentiment ◽

Machine Learning Models

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.

Download Full-text

Image-based 3D Building Reconstruction Using A-KAZE Feature Extraction Algorithm

Proceedings of the 35th International Symposium on Automation and Robotics in Construction (ISARC) ◽

10.22260/isarc2018/0127 ◽

2018 ◽

Author(s):

Hyeonwoo Seong ◽

Hyunchul Choi ◽

Hyojoo Son ◽

Changwan Kim

Keyword(s):

Feature Extraction ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Kaze Feature ◽

Building Reconstruction

Download Full-text

Indentation Mark Feature Extraction Algorithm Based on Local Gradient Directional Ternary Pattern and CNN

Proceedings of the 2019 International Symposium on Signal Processing Systems - SSPS 2019 ◽

10.1145/3364908.3365299 ◽

2019 ◽

Author(s):

Haitao Dong ◽

Ying Liu ◽

Fuping Wang ◽

Keng Pang Lim

Keyword(s):

Feature Extraction ◽

Indentation Mark ◽

Feature Extraction Algorithm ◽

Extraction Algorithm ◽

Local Gradient

Download Full-text