Automated Static Code Analysis for Classifying Android Applications Using Machine Learning

Mobile devices have revolutionized many aspects of our lives. Without realizing it, we often run on them programs that access and transmit private information over the network. Integrity concerns arise when mobile applications use untrusted data as input to security-sensitive computations. Program-analysis tools for integrity and confidentiality enforcement have become a necessity. Static-analysis tools are particularly attractive because they do not require installing and executing the program, and have the potential of never missing any vulnerability. Nevertheless, such tools often have high false-positive rates. In order to reduce the number of false positives, static analysis has to be very precise, but this is in conflict with the analysis' performance and scalability, requiring a more refined model of the application. This chapter proposes Phoenix, a novel solution that combines static analysis with machine learning to identify programs exhibiting suspicious operations. This approach has been widely applied to mobile applications obtaining impressive results.

Download Full-text

Combining Static Code Analysis and Machine Learning for Automatic Detection of Security Vulnerabilities in Mobile Apps

Mobile Application Development, Usability, and Security - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-5225-0945-5.ch004 ◽

2017 ◽

pp. 68-94

Author(s):

Marco Pistoia ◽

Omer Tripp ◽

David Lubensky

Keyword(s):

Machine Learning ◽

Static Analysis ◽

Private Information ◽

Program Analysis ◽

Mobile Applications ◽

Mobile Apps ◽

Security Vulnerabilities ◽

Code Analysis ◽

Analysis Tools ◽

Static Code Analysis

Mobile devices have revolutionized many aspects of our lives. Without realizing it, we often run on them programs that access and transmit private information over the network. Integrity concerns arise when mobile applications use untrusted data as input to security-sensitive computations. Program-analysis tools for integrity and confidentiality enforcement have become a necessity. Static-analysis tools are particularly attractive because they do not require installing and executing the program, and have the potential of never missing any vulnerability. Nevertheless, such tools often have high false-positive rates. In order to reduce the number of false positives, static analysis has to be very precise, but this is in conflict with the analysis' performance and scalability, requiring a more refined model of the application. This chapter proposes Phoenix, a novel solution that combines static analysis with machine learning to identify programs exhibiting suspicious operations. This approach has been widely applied to mobile applications obtaining impressive results.

Download Full-text

Using Machine Learning Techniques to Classify and Predict Static Code Analysis Tool Warnings

2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA) ◽

10.1109/aiccsa.2018.8612819 ◽

2018 ◽

Cited By ~ 1

Author(s):

Enas A. Alikhashashneh ◽

Rajeev R. Raje ◽

James H. Hill

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Analysis Tool ◽

Code Analysis ◽

Static Code Analysis ◽

Learning Techniques

Download Full-text

Adversarial EXEmples

ACM Transactions on Privacy and Security ◽

10.1145/3473039 ◽

2021 ◽

Vol 24 (4) ◽

pp. 1-31

Author(s):

Luca Demetrio ◽

Scott E. Coull ◽

Battista Biggio ◽

Giovanni Lagorio ◽

Alessandro Armando ◽

...

Keyword(s):

Machine Learning ◽

Domain Knowledge ◽

Black Box ◽

Mitigation Strategies ◽

File Format ◽

Subject Matter Experts ◽

Code Analysis ◽

Static Code Analysis ◽

Executable File ◽

Functional Areas

Recent work has shown that adversarial Windows malware samples—referred to as adversarial EXE mples in this article—can bypass machine learning-based detection relying on static code analysis by perturbing relatively few input bytes. To preserve malicious functionality, previous attacks either add bytes to existing non-functional areas of the file, potentially limiting their effectiveness, or require running computationally demanding validation steps to discard malware variants that do not correctly execute in sandbox environments. In this work, we overcome these limitations by developing a unifying framework that does not only encompass and generalize previous attacks against machine-learning models, but also includes three novel attacks based on practical, functionality-preserving manipulations to the Windows Portable Executable file format. These attacks, named Full DOS , Extend , and Shift , inject the adversarial payload by respectively manipulating the DOS header, extending it, and shifting the content of the first section. Our experimental results show that these attacks outperform existing ones in both white-box and black-box scenarios, achieving a better tradeoff in terms of evasion rate and size of the injected payload, while also enabling evasion of models that have been shown to be robust to previous attacks. To facilitate reproducibility of our findings, we open source our framework and all the corresponding attack implementations as part of the secml-malware Python library. We conclude this work by discussing the limitations of current machine learning-based malware detectors, along with potential mitigation strategies based on embedding domain knowledge coming from subject-matter experts directly into the learning process.

Download Full-text

Analysis of the Tools for Static Code Analysis

2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH) ◽

10.1109/infoteh51037.2021.9400688 ◽

2021 ◽

Author(s):

Danilo Nikolic ◽

Darko Stefanovic ◽

Dusanka Dakic ◽

Srdan Sladojevic ◽

Sonja Ristic

Keyword(s):

Code Analysis ◽

Static Code Analysis

Download Full-text

Detecting Malicious Android Applications Based On API calls and Permissions Using Machine learning Algorithms

2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC) ◽

10.1109/miucc52538.2021.9447594 ◽

2021 ◽

Author(s):

Seif ElDein Mohamed ◽

Mostafa Ashaf ◽

Amr Ehab ◽

Omar Shereef ◽

Haytham Metwaie ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Android Applications

Download Full-text

Enhanced Bug Prediction in JavaScript Programs with Hybrid Call-Graph Based Invocation Metrics

Technologies ◽

10.3390/technologies9010003 ◽

2020 ◽

Vol 9 (1) ◽

pp. 3

Author(s):

Gábor Antal ◽

Zoltán Tóth ◽

Péter Hegedűs ◽

Rudolf Ferenc

Keyword(s):

Software Maintenance ◽

Positive Impact ◽

Source Code ◽

Code Analysis ◽

Static Source ◽

Static Code Analysis ◽

Function Calls ◽

Hybrid Code ◽

Code Metrics ◽

Scripting Language

Bug prediction aims at finding source code elements in a software system that are likely to contain defects. Being aware of the most error-prone parts of the program, one can efficiently allocate the limited amount of testing and code review resources. Therefore, bug prediction can support software maintenance and evolution to a great extent. In this paper, we propose a function level JavaScript bug prediction model based on static source code metrics with the addition of a hybrid (static and dynamic) code analysis based metric of the number of incoming and outgoing function calls (HNII and HNOI). Our motivation for this is that JavaScript is a highly dynamic scripting language for which static code analysis might be very imprecise; therefore, using a purely static source code features for bug prediction might not be enough. Based on a study where we extracted 824 buggy and 1943 non-buggy functions from the publicly available BugsJS dataset for the ESLint JavaScript project, we can confirm the positive impact of hybrid code metrics on the prediction performance of the ML models. Depending on the ML algorithm, applied hyper-parameters, and target measures we consider, hybrid invocation metrics bring a 2–10% increase in model performances (i.e., precision, recall, F-measure). Interestingly, replacing static NOI and NII metrics with their hybrid counterparts HNOI and HNII in itself improves model performances; however, using them all together yields the best results.

Download Full-text