A Review of Malware Classification Methods using Machine Learning

Review of classification studies for machine learning in the development of intelligent management decision support systems

Technology of technosphere safety ◽

10.25257/tts.2020.3.89.20-29 ◽

2020 ◽

Vol 89 ◽

pp. 20-29

Author(s):

Sh. K. Kadiev ◽

◽

R. Sh. Khabibulin ◽

P. P. Godlevskiy ◽

V. L. Semikov ◽

...

Keyword(s):

Machine Learning ◽

Decision Support ◽

Mathematical Models ◽

Decision Support Systems ◽

Support Systems ◽

Management Decision ◽

Classification Methods ◽

Advantages And Disadvantages ◽

Intelligent Management ◽

Management Decision Support

Introduction. An overview of research in the field of classification as a method of machine learning is given. Articles containing mathematical models and algorithms for classification were selected. The use of classification in intelligent management decision support systems in various subject areas is also relevant. Goal and objectives. The purpose of the study is to analyze papers on the classification as a machine learning method. To achieve the objective, it is necessary to solve the following tasks: 1) to identify the most used classification methods in machine learning; 2) to highlight the advantages and disadvantages of each of the selected methods; 3) to analyze the possibility of using classification methods in intelligent systems to support management decisions to solve issues of forecasting, prevention and elimination of emergencies. Methods. To obtain the results, general scientific and special methods of scientific knowledge were used - analysis, synthesis, generalization, as well as the classification method. Results and discussion thereof. According to the results of the analysis, studies with a mathematical formulation and the availability of software developments were identified. The issues of classification in the implementation of machine learning in the development of intelligent decision support systems are considered. Conclusion. The analysis revealed that enough algorithms were used to perform the classification while sorting the acquired knowledge within the subject area. The implementation of an accurate classification is one of the fundamental problems in the development of management decision support systems, including for fire and emergency prevention and response. Timely and effective decision by officials of operational shifts for the disaster management is also relevant. Key words: decision support, analysis, classification, machine learning, algorithm, mathematical models.

Download Full-text

COVID-19 Public Sentiment Insights and Machine Learning for Tweets Classification

Information ◽

10.3390/info11060314 ◽

2020 ◽

Vol 11 (6) ◽

pp. 314 ◽

Cited By ~ 17

Author(s):

Jim Samuel ◽

G. G. Md. Nawaz Ali ◽

Md. Mokhlesur Rahman ◽

Ek Esawi ◽

Yana Samuel

Keyword(s):

Machine Learning ◽

The United States ◽

Classification Methods ◽

Reasonable Accuracy ◽

Bayes Method ◽

Inaccurate Information ◽

Public Sentiment ◽

Research Article ◽

Textual Data ◽

Data Visualizations

Along with the Coronavirus pandemic, another crisis has manifested itself in the form of mass fear and panic phenomena, fueled by incomplete and often inaccurate information. There is therefore a tremendous need to address and better understand COVID-19’s informational crisis and gauge public sentiment, so that appropriate messaging and policy decisions can be implemented. In this research article, we identify public sentiment associated with the pandemic using Coronavirus specific Tweets and R statistical software, along with its sentiment analysis packages. We demonstrate insights into the progress of fear-sentiment over time as COVID-19 approached peak levels in the United States, using descriptive textual analytics supported by necessary textual data visualizations. Furthermore, we provide a methodological overview of two essential machine learning (ML) classification methods, in the context of textual analytics, and compare their effectiveness in classifying Coronavirus Tweets of varying lengths. We observe a strong classification accuracy of 91% for short Tweets, with the Naïve Bayes method. We also observe that the logistic regression classification method provides a reasonable accuracy of 74% with shorter Tweets, and both methods showed relatively weaker performance for longer Tweets. This research provides insights into Coronavirus fear sentiment progression, and outlines associated methods, implications, limitations and opportunities.

Download Full-text

Machine Learning Comparison and Parameter Setting Methods for the Detection of Dump Sites for Construction and Demolition Waste Using the Google Earth Engine

Remote Sensing ◽

10.3390/rs13040787 ◽

2021 ◽

Vol 13 (4) ◽

pp. 787

Author(s):

Lei Zhou ◽

Ting Luo ◽

Mingyi Du ◽

Qiang Chen ◽

Yang Liu ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Google Earth ◽

Construction And Demolition Waste ◽

Parameterization Scheme ◽

Classification Methods ◽

Demolition Waste ◽

Optimal Method ◽

Identification Method ◽

Google Earth Engine

Machine learning has been successfully used for object recognition within images. Due to the complexity of the spectrum and texture of construction and demolition waste (C&DW), it is difficult to construct an automatic identification method for C&DW based on machine learning and remote sensing data sources. Machine learning includes many types of algorithms; however, different algorithms and parameters have different identification effects on C&DW. Exploring the optimal method for automatic remote sensing identification of C&DW is an important approach for the intelligent supervision of C&DW. This study investigates the megacity of Beijing, which is facing high risk of C&DW pollution. To improve the classification accuracy of C&DW, buildings, vegetation, water, and crops were selected as comparative training samples based on the Google Earth Engine (GEE), and Sentinel-2 was used as the data source. Three classification methods of typical machine learning algorithms (classification and regression trees (CART), random forest (RF), and support vector machine (SVM)) were selected to classify the C&DW from remote sensing images. Using empirical methods, the experimental trial method, and the grid search method, the optimal parameterization scheme of the three classification methods was studied to determine the optimal method of remote sensing identification of C&DW based on machine learning. Through accuracy evaluation and ground verification, the overall recognition accuracies of CART, RF, and SVM for C&DW were 73.12%, 98.05%, and 85.62%, respectively, under the optimal parameterization scheme determined in this study. Among these algorithms, RF was a better C&DW identification method than were CART and SVM when the number of decision trees was 50. This study explores the robust machine learning method for automatic remote sensing identification of C&DW and provides a scientific basis for intelligent supervision and resource utilization of C&DW.

Download Full-text

Machine learning classification methods informing the management of inconclusive reactors at bovine tuberculosis surveillance tests in England

Preventive Veterinary Medicine ◽

10.1016/j.prevetmed.2021.105565 ◽

2021 ◽

pp. 105565

Author(s):

M. Pilar Romero ◽

Yu-Mei Chang ◽

Lucy A. Brunton ◽

Jessica Parry ◽

Alison Prosser ◽

...

Keyword(s):

Machine Learning ◽

Bovine Tuberculosis ◽

Classification Methods ◽

Machine Learning Classification

Download Full-text

Applying Machine Learning for Improving Performance Classification on Driving Behavior

IJITEE (International Journal of Information Technology and Electrical Engineering) ◽

10.22146/ijitee.56919 ◽

2021 ◽

Vol 4 (1) ◽

pp. 8

Author(s):

Ahmad Iwan Fadli ◽

Selo Sulistyo ◽

Sigit Wibowo

Keyword(s):

Machine Learning ◽

Traffic Accident ◽

Large Scale ◽

Detection System ◽

Difficult Problem ◽

Sensor Data ◽

Driving Safety ◽

Support Vector ◽

Classification Methods ◽

Machine Learning Classification

Traffic accident is a very difficult problem to handle on a large scale in a country. Indonesia is one of the most populated, developing countries that use vehicles for daily activities as its main transportation. It is also the country with the largest number of car users in Southeast Asia, so driving safety needs to be considered. Using machine learning classification method to determine whether a driver is driving safely or not can help reduce the risk of driving accidents. We created a detection system to classify whether the driver is driving safely or unsafely using trip sensor data, which include Gyroscope, Acceleration, and GPS. The classification methods used in this study are Random Forest (RF) classification algorithm, Support Vector Machine (SVM), and Multilayer Perceptron (MLP) by improving data preprocessing using feature extraction and oversampling methods. This study shows that RF has the best performance with 98% accuracy, 98% precision, and 97% sensitivity using the proposed preprocessing stages compared to SVM or MLP.

Download Full-text

Integration of synthetic minority oversampling technique for imbalanced class

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v13.i1.pp102-108 ◽

2019 ◽

Vol 13 (1) ◽

pp. 102

Author(s):

Noviyanti Santoso ◽

Wahyu Wibowo ◽

Hilda Hikmawati

Keyword(s):

Machine Learning ◽

Data Mining ◽

Support Vector Machine ◽

Class Imbalance ◽

Original Data ◽

Support Vector ◽

Classification Methods ◽

Problematic Issue ◽

Imbalanced Class ◽

F Measure

In the data mining, a class imbalance is a problematic issue to look for the solutions. It probably because machine learning is constructed by using algorithms with assuming the number of instances in each balanced class, so when using a class imbalance, it is possible that the prediction results are not appropriate. They are solutions offered to solve class imbalance issues, including oversampling, undersampling, and synthetic minority oversampling technique (SMOTE). Both oversampling and undersampling have its disadvantages, so SMOTE is an alternative to overcome it. By integrating SMOTE in the data mining classification method such as Naive Bayes, Support Vector Machine (SVM), and Random Forest (RF) is expected to improve the performance of accuracy. In this research, it was found that the data of SMOTE gave better accuracy than the original data. In addition to the three classification methods used, RF gives the highest average AUC, F-measure, and G-means score.

Download Full-text

Machine Learning–based Analysis of English Lateral Allophones

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2019-0029 ◽

2019 ◽

Vol 29 (2) ◽

pp. 393-405 ◽

Cited By ~ 1

Author(s):

Magdalena Piotrowska ◽

Gražina Korvel ◽

Bożena Kostek ◽

Tomasz Ciszewski ◽

Andrzej Cżyzewski

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Nearest Neighbor ◽

Native Speakers ◽

Automatic Evaluation ◽

Classification Methods ◽

K Nearest Neighbor ◽

Self Organizing Maps ◽

Automatic Methods ◽

Audio Video

Abstract Automatic classification methods, such as artificial neural networks (ANNs), the k-nearest neighbor (kNN) and self-organizing maps (SOMs), are applied to allophone analysis based on recorded speech. A list of 650 words was created for that purpose, containing positionally and/or contextually conditioned allophones. For each word, a group of 16 native and non-native speakers were audio-video recorded, from which seven native speakers’ and phonology experts’ speech was selected for analyses. For the purpose of the present study, a sub-list of 103 words containing the English alveolar lateral phoneme /l/ was compiled. The list includes ‘dark’ (velarized) allophonic realizations (which occur before a consonant or at the end of the word before silence) and 52 ‘clear’ allophonic realizations (which occur before a vowel), as well as voicing variants. The recorded signals were segmented into allophones and parametrized using a set of descriptors, originating from the MPEG 7 standard, plus dedicated time-based parameters as well as modified MFCC features proposed by the authors. Classification methods such as ANNs, the kNN and the SOM were employed to automatically detect the two types of allophones. Various sets of features were tested to achieve the best performance of the automatic methods. In the final experiment, a selected set of features was used for automatic evaluation of the pronunciation of dark /l/ by non-native speakers.

Download Full-text

Ensemble Machine Learning Approach for Android Malware Classification Using Hybrid Features

Advances in Intelligent Systems and Computing - Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017 ◽

10.1007/978-3-319-59162-9_20 ◽

2017 ◽

pp. 191-200 ◽

Cited By ~ 1

Author(s):

Abdurrahman Pektaş ◽

Tankut Acarman

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Hybrid Features ◽

Android Malware ◽

Malware Classification ◽

Ensemble Machine Learning ◽

Machine Learning Approach

Download Full-text

Cost-sensitive meta-learning framework

Journal of Modelling in Management ◽

10.1108/jm2-03-2021-0065 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Samar Ali Shilbayeh ◽

Sunil Vadera

Keyword(s):

Machine Learning ◽

Learning System ◽

Classification Algorithms ◽

Data Sets ◽

Classification Methods ◽

Content Type ◽

Learning Framework ◽

Cost Sensitive Classification ◽

Meta Learning ◽

Training Examples

Purpose This paper aims to describe the use of a meta-learning framework for recommending cost-sensitive classification methods with the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” Design/methodology/approach This paper describes the use of a meta-learning framework for recommending cost-sensitive classification methods for the aim of answering an important question that arises in machine learning, namely, “Among all the available classification algorithms, and in considering a specific type of data and cost, which is the best algorithm for my problem?” The framework is based on the idea of applying machine learning techniques to discover knowledge about the performance of different machine learning algorithms. It includes components that repeatedly apply different classification methods on data sets and measures their performance. The characteristics of the data sets, combined with the algorithms and the performance provide the training examples. A decision tree algorithm is applied to the training examples to induce the knowledge, which can then be used to recommend algorithms for new data sets. The paper makes a contribution to both meta-learning and cost-sensitive machine learning approaches. Those both fields are not new, however, building a recommender that recommends the optimal case-sensitive approach for a given data problem is the contribution. The proposed solution is implemented in WEKA and evaluated by applying it on different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. The developed solution takes the misclassification cost into consideration during the learning process, which is not available in the compared project. Findings The proposed solution is implemented in WEKA and evaluated by applying it to different data sets and comparing the results with existing studies available in the literature. The results show that a developed meta-learning solution produces better results than METAL, a well-known meta-learning system. Originality/value The paper presents a major piece of new information in writing for the first time. Meta-learning work has been done before but this paper presents a new meta-learning framework that is costs sensitive.

Download Full-text

A Comprehensive Survey on Identification of Malware Types and Malware Classification Using Machine Learning Techniques

10.1109/icosec51865.2021.9591763 ◽

2021 ◽

Author(s):

Nagababu Pachhala ◽

S. Jothilakshmi ◽

Bhanu Prakash Battula

Keyword(s):

Machine Learning ◽

Machine Learning Techniques ◽

Malware Classification ◽

Learning Techniques ◽

Comprehensive Survey

Download Full-text