A Novel Malware Classification Method Based on Crucial Behavior

Mathematical Problems in Engineering ◽

10.1155/2020/6804290 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Fei Xiao ◽

Yi Sun ◽

Donggao Du ◽

Xuelei Li ◽

Min Luo

Keyword(s):

Extraction Method ◽

Information Gain ◽

Malware Detection ◽

High Accuracy ◽

Inverse Document Frequency ◽

Malware Classification ◽

Execution Traces ◽

Document Frequency ◽

Call Graphs ◽

Dynamic Execution

Recently, some graph-based methods have been proposed for malware detection. However, current malware is generally characterized by sophisticated behaviors, which makes graph-based malware detection extremely challenging. To address this issue, we propose a graph repartition algorithm by transforming API call graphs into fragment behaviors based on programs’ dynamic execution traces. The proposed algorithm relies on the N-order subgraph (NSG) for constructing the appropriate fragment behavior. Moreover, we improve the term frequency-inverse document frequency- (TF-IDF-) like measure and information gain (IG) to extract the crucial N-order subgraph (CNSG). This novel behavioral representation and improved extraction method can accurately represent crucial behaviors of malware. Experiments on 4,400 samples demonstrate that the proposed method achieves a high accuracy of 99.75% in malware detection and promising performance of 95.27% in malware classification.

Download Full-text

A comparative study of sentiment analysis using SVM and SentiWordNet

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i3.pp902-909 ◽

2019 ◽

Vol 13 (3) ◽

pp. 902 ◽

Cited By ~ 7

Author(s):

Mohammad Fikri ◽

Riyanarto Sarno

Keyword(s):

Sentiment Analysis ◽

Extraction Method ◽

Support Vector ◽

The Internet ◽

Imbalanced Dataset ◽

Rule Based ◽

Inverse Document Frequency ◽

Feature Extraction Method ◽

Document Frequency ◽

Svm Algorithm

<p><span>Sentiment analysis has grown rapidly which impact on the number of services using the internet popping up in Indonesia. In this research, the sentiment analysis uses the rule-based method with the help of SentiWordNet and Support Vector Machine (SVM) algorithm with Term Frequency–Inverse Document Frequency (TF-IDF) as feature extraction method. Since the number of sentences in positive, negative and neutral classes is imbalanced, the oversampling method is implemented. For imbalanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 56% and 76%, respectively. However, for the balanced dataset, the rule-based SentiWordNet and SVM algorithm achieve accuracies of 52% and 89%, respectively.</span></p>

Download Full-text

IntDroid

ACM Transactions on Software Engineering and Methodology ◽

10.1145/3442588 ◽

2021 ◽

Vol 30 (3) ◽

pp. 1-32

Author(s):

Deqing Zou ◽

Yueming Wu ◽

Siru Yang ◽

Anki Chauhan ◽

Wei Yang ◽

...

Keyword(s):

Social Network ◽

Social Network Analysis ◽

Network Analysis ◽

Malware Detection ◽

High Accuracy ◽

Graph Representation ◽

Android Malware ◽

Android Malware Detection ◽

Call Graphs ◽

High Scalability

Android, the most popular mobile operating system, has attracted millions of users around the world. Meanwhile, the number of new Android malware instances has grown exponentially in recent years. On the one hand, existing Android malware detection systems have shown that distilling the program semantics into a graph representation and detecting malicious programs by conducting graph matching are able to achieve high accuracy on detecting Android malware. However, these traditional graph-based approaches always perform expensive program analysis and suffer from low scalability on malware detection. On the other hand, because of the high scalability of social network analysis, it has been applied to complete large-scale malware detection. However, the social-network-analysis-based method only considers simple semantic information (i.e., centrality) for achieving market-wide mobile malware scanning, which may limit the detection effectiveness when benign apps show some similar behaviors as malware. In this article, we aim to combine the high accuracy of traditional graph-based method with the high scalability of social-network-analysis--based method for Android malware detection. Instead of using traditional heavyweight static analysis, we treat function call graphs of apps as complex social networks and apply social-network--based centrality analysis to unearth the central nodes within call graphs. After obtaining the central nodes, the average intimacies between sensitive API calls and central nodes are computed to represent the semantic features of the graphs. We implement our approach in a tool called IntDroid and evaluate it on a dataset of 3,988 benign samples and 4,265 malicious samples. Experimental results show that IntDroid is capable of detecting Android malware with an F-measure of 97.1% while maintaining a True-positive Rate of 99.1%. Although the scalability is not as fast as a social-network-analysis--based method (i.e., MalScan ), compared to a traditional graph-based method, IntDroid is more than six times faster than MaMaDroid . Moreover, in a corpus of apps collected from GooglePlay market, IntDroid is able to identify 28 zero-day malware that can evade detection of existing tools, one of which has been downloaded and installed by more than ten million users. This app has also been flagged as malware by six anti-virus scanners in VirusTotal, one of which is Symantec Mobile Insight .

Download Full-text

Comparison of Malware Classification Methods using Convolutional Neural Network based on API Call Stream

International Journal of Network Security & Its Applications ◽

10.5121/ijnsa.2021.13201 ◽

2021 ◽

Vol 13 (2) ◽

pp. 1-19

Author(s):

Matthew Schofield ◽

Gulsum Alicioglu ◽

Bo Sun ◽

Russell Binaco ◽

Paul Turner ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Machine Learning Algorithms ◽

Inverse Document Frequency ◽

Detection Techniques ◽

Malware Classification ◽

Document Frequency ◽

Program Interface ◽

Type Classification

Malicious software is constantly being developed and improved, so detection and classification of malwareis an ever-evolving problem. Since traditional malware detection techniques fail to detect new/unknown malware, machine learning algorithms have been used to overcome this disadvantage. We present a Convolutional Neural Network (CNN) for malware type classification based on the API (Application Program Interface) calls. This research uses a database of 7107 instances of API call streams and 8 different malware types:Adware, Backdoor, Downloader, Dropper, Spyware, Trojan, Virus,Worm. We used a 1-Dimensional CNN by mapping API calls as categorical and term frequency-inverse document frequency (TF-IDF) vectors and compared the results to other classification techniques.The proposed 1-D CNN outperformed other classification techniques with 91% overall accuracy for both categorical and TFIDF vectors.

Download Full-text

Malware Detection Based on Code Visualization and Two-Level Classification

Information ◽

10.3390/info12030118 ◽

2021 ◽

Vol 12 (3) ◽

pp. 118

Author(s):

Vassilios Moussas ◽

Antonios Andreatos

Keyword(s):

Web Applications ◽

Detection System ◽

Malware Detection ◽

Image Features ◽

Malicious Code ◽

Malware Analysis ◽

Malicious Software ◽

Antivirus Software ◽

Malware Classification ◽

Artificial Neural Network Ann

Malware creators generate new malicious software samples by making minor changes in previously generated code, in order to reuse malicious code, as well as to go unnoticed from signature-based antivirus software. As a result, various families of variations of the same initial code exist today. Visualization of compiled executables for malware analysis has been proposed several years ago. Visualization can greatly assist malware classification and requires neither disassembly nor code execution. Moreover, new variations of known malware families are instantly detected, in contrast to traditional signature-based antivirus software. This paper addresses the problem of identifying variations of existing malware visualized as images. A new malware detection system based on a two-level Artificial Neural Network (ANN) is proposed. The classification is based on file and image features. The proposed system is tested on the ‘Malimg’ dataset consisting of the visual representation of well-known malware families. From this set some important image features are extracted. Based on these features, the ANN is trained. Then, this ANN is used to detect and classify other samples of the dataset. Malware families creating a confusion are classified by a second level of ANNs. The proposed two-level ANN method excels in simplicity, accuracy, and speed; it is easy to implement and fast to run, thus it can be applied to antivirus software, smart firewalls, web applications, etc.

Download Full-text

Large expert-curated database for benchmarking document similarity detection in biomedical literature search

Database ◽

10.1093/database/baz085 ◽

2019 ◽

Vol 2019 ◽

Author(s):

Peter Brown ◽

Aik-Choon Tan ◽

Mohamed A El-Esawi ◽

Thomas Liehr ◽

Oliver Blanck ◽

...

Keyword(s):

Literature Search ◽

Relevant Literature ◽

Biomedical Literature ◽

Medical Subject Headings ◽

Document Similarity ◽

Inverse Document Frequency ◽

Research Fields ◽

Experience Levels ◽

Document Frequency ◽

Systematic Biases

Abstract Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency–Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.

Download Full-text

AFCGDroid: Deep Learning Based Android Malware Detection Using Attributed Function Call Graphs

Journal of Physics Conference Series ◽

10.1088/1742-6596/1693/1/012080 ◽

2020 ◽

Vol 1693 ◽

pp. 012080

Author(s):

Tong Lu ◽

Xiaoyuan Liu ◽

Jingwei Chen ◽

Naitian Hu ◽

Bo Liu

Keyword(s):

Deep Learning ◽

Malware Detection ◽

Android Malware ◽

Android Malware Detection ◽

Function Call ◽

Call Graphs

Download Full-text

Poisson mixtures

Natural Language Engineering ◽

10.1017/s1351324900000139 ◽

1995 ◽

Vol 1 (2) ◽

pp. 163-190 ◽

Cited By ~ 146

Author(s):

Kenneth W. Church ◽

William A. Gale

Keyword(s):

Negative Binomial ◽

Probability Distributions ◽

Hidden Variables ◽

Heterogeneous Structure ◽

Text Compression ◽

Inverse Document Frequency ◽

Poisson Mixtures ◽

Document Frequency ◽

Wide Range ◽

Better Than

AbstractShannon (1948) showed that a wide range of practical problems can be reduced to the problem of estimating probability distributions of words and ngrams in text. It has become standard practice in text compression, speech recognition, information retrieval and many other applications of Shannon's theory to introduce a “bag-of-words” assumption. But obviously, word rates vary from genre to genre, author to author, topic to topic, document to document, section to section, and paragraph to paragraph. The proposed Poisson mixture captures much of this heterogeneous structure by allowing the Poisson parameter θ to vary over documents subject to a density function φ. φ is intended to capture dependencies on hidden variables such genre, author, topic, etc. (The Negative Binomial is a well-known special case where φ is a Г distribution.) Poisson mixtures fit the data better than standard Poissons, producing more accurate estimates of the variance over documents (σ2), entropy (H), inverse document frequency (IDF), and adaptation (Pr(x ≥ 2/x ≥ 1)).

Download Full-text

Inverse document frequency-based sensitivity scoring for privacy analysis

Signal Image and Video Processing ◽

10.1007/s11760-021-02013-1 ◽

2021 ◽

Author(s):

Onder Coban ◽

Ali Inan ◽

Selma Ayse Ozel

Keyword(s):

Inverse Document Frequency ◽

Document Frequency ◽

Privacy Analysis

Download Full-text

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Informatics and Automation - Информатика и автоматизация ◽

10.15622/ia.2021.3.5 ◽

2021 ◽

Vol 20 (3) ◽

pp. 623-653

Author(s):

Saud Altaf ◽

Sofia Iqbal ◽

Muhammad Waseem Soomro

Keyword(s):

Natural Language ◽

Short Term Memory ◽

Short Term ◽

Vocabulary Size ◽

Language Understanding ◽

Inverse Document Frequency ◽

Classification Technique ◽

Document Frequency ◽

Text Features ◽

Long Short Term Memory

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Download Full-text

Malware Detection Based on Graph Attention Networks for Intelligent Transportation Systems

Electronics ◽

10.3390/electronics10202534 ◽

2021 ◽

Vol 10 (20) ◽

pp. 2534

Author(s):

Cagatay Catal ◽

Hakan Gunduz ◽

Alper Ozcan

Keyword(s):

Intelligent Transportation Systems ◽

Malware Detection ◽

Application Programming Interface ◽

Intelligent Transportation ◽

Transportation Systems ◽

Convolutional Network ◽

Attention Networks ◽

Security Issues ◽

Call Graphs ◽

Network Metrics

Intelligent Transportation Systems (ITS) aim to make transportation smarter, safer, reliable, and environmentally friendly without detrimentally affecting the service quality. ITS can face security issues due to their complex, dynamic, and non-linear properties. One of the most critical security problems is attacks that damage the infrastructure of the entire ITS. Attackers can inject malware code that triggers dangerous actions such as information theft and unwanted system moves. The main objective of this study is to improve the performance of malware detection models using Graph Attention Networks. To detect malware attacks addressing ITS, a Graph Attention Network (GAN)-based framework is proposed in this study. The inputs to this framework are the Application Programming Interface (API)-call graphs obtained from malware and benign Android apk files. During the graph creation, network metrics and the Node2Vec model are utilized to generate the node features. A GAN-based model is combined with different types of node features during the experiments and the performance is compared against Graph Convolutional Network (GCN). Experimental results demonstrated that the integration of the GAN and Node2Vec models provides the best performance in terms of F-measure and accuracy parameters and, also, the use of an attention mechanism in GAN improves the performance. Furthermore, node features generated with Node2Vec resulted in a 3% increase in classification accuracy compared to the features generated with network metrics.

Download Full-text