Malware Classification Using Simhash Encoding and PCA (MCSP)

Young-Man Kwon; Jae-Ju An; Myung-Jae Lim; Seongsoo Cho; Won-Mo Gal

doi:10.3390/sym12050830

Malware Classification Using Simhash Encoding and PCA (MCSP)

Symmetry ◽

10.3390/sym12050830 ◽

2020 ◽

Vol 12 (5) ◽

pp. 830

Author(s):

Young-Man Kwon ◽

Jae-Ju An ◽

Myung-Jae Lim ◽

Seongsoo Cho ◽

Won-Mo Gal

Keyword(s):

Covariance Matrix ◽

Computer Systems ◽

Classification Methods ◽

Imbalanced Dataset ◽

Malware Classification ◽

Maximum Accuracy ◽

Average Accuracy ◽

Executable File ◽

Linear Transform ◽

Malicious Program

Malware is any malicious program that can attack the security of other computer systems for various purposes. The threat of malware has significantly increased in recent years. To protect our computer systems, we need to analyze an executable file to decide whether it is malicious or not. In this paper, we propose two malware classification methods: malware classification using Simhash and PCA (MCSP), and malware classification using Simhash and linear transform (MCSLT). PCA uses the symmetrical covariance matrix. The former method combines Simhash encoding and PCA, and the latter combines Simhash encoding and linear transform layer. To verify the performance of our methods, we compared them with basic malware classification using Simhash and CNN (MCSC) using tanh and relu activation. We used a highly imbalanced dataset with 10,736 samples. As a result, our MCSP method showed the best performance with a maximum accuracy of 98.74% and an average accuracy of 98.59%. It showed an average F1 score of 99.2%. In addition, the MCSLT method showed better performance than MCSC in accuracy and F1 score.

Download Full-text

MalDeep: A Deep Learning Classification Framework against Malware Variants Based on Texture Visualization

Security and Communication Networks ◽

10.1155/2019/4895984 ◽

2019 ◽

Vol 2019 ◽

pp. 1-11

Author(s):

Yuntao Zhao ◽

Chunyu Xu ◽

Bo Bo ◽

Yongxin Feng

Keyword(s):

Deep Learning ◽

Classification Accuracy ◽

Feature Space ◽

Image Texture ◽

Accuracy Rate ◽

Texture Representation ◽

Malware Classification ◽

Classification Framework ◽

Average Accuracy ◽

New Feature

The increasing sophistication of malware variants such as encryption, polymorphism, and obfuscation calls for the new detection and classification technology. In this paper, MalDeep, a novel malware classification framework of deep learning based on texture visualization, is proposed against malicious variants. Through code mapping, texture partitioning, and texture extracting, we can study malware classification in a new feature space of image texture representation without decryption and disassembly. Furthermore, we built a malware classifier on convolutional neural network with two convolutional layers, two downsampling layers, and many full connection layers. We adopt the dataset, from Microsoft Malware Classification Challenge including 9 categories of malware families and 10868 variant samples, to train the model. The experiment results show that the established MalDeep has a higher accuracy rate for malware classification. In particular, for some backdoor families, the classification accuracy of the model reaches over 99%. Moreover, compared with other main antivirus software, MalDeep also outperforms others in the average accuracy for the variants from different families.

Download Full-text

CloudA: A Ground-Based Cloud Classification Method with a Convolutional Neural Network

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-19-0189.1 ◽

2020 ◽

Vol 37 (9) ◽

pp. 1661-1668

Author(s):

Min Wang ◽

Shudao Zhou ◽

Zhong Yang ◽

Zhanhua Liu

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Image Recognition ◽

Learning Ability ◽

Visualization Method ◽

Classification Methods ◽

Recognition Method ◽

Cloud Classification ◽

Average Accuracy

AbstractConventional classification methods are based on artificial experience to extract features, and each link is independent, which is a kind of “shallow learning.” As a result, the scope of the cloud category applied by this method is limited. In this paper, we propose a new convolutional neural network (CNN) with deep learning ability, called CloudA, for the ground-based cloud image recognition method. We use the Singapore Whole-Sky Imaging Categories (SWIMCAT) sample library and total-sky sample library to train and test CloudA. In particular, we visualize the cloud features captured by CloudA using the TensorBoard visualization method, and these features can help us to understand the process of ground-based cloud classification. We compare this method with other commonly used methods to explore the feasibility of using CloudA to classify ground-based cloud images, and the evaluation of a large number of experiments show that the average accuracy of this method is nearly 98.63% for ground-based cloud classification.

Download Full-text

Covariance Matrix Reconstruction for Direction Finding with Nested Arrays Using Iterative Reweighted Nuclear Norm Minimization

International Journal of Antennas and Propagation ◽

10.1155/2019/7657898 ◽

2019 ◽

Vol 2019 ◽

pp. 1-13 ◽

Cited By ~ 6

Author(s):

Weijie Tan ◽

Xi’an Feng

Keyword(s):

Covariance Matrix ◽

Regularization Parameter ◽

Estimation Method ◽

Estimation Algorithm ◽

Doa Estimation ◽

Direction Finding ◽

Nuclear Norm ◽

Low Rank ◽

Matrix Reconstruction ◽

Linear Transform

In this paper, we address the direction finding problem in the background of unknown nonuniform noise with nested array. A novel gridless direction finding method is proposed via the low-rank covariance matrix approximation, which is based on a reweighted nuclear norm optimization. In the proposed method, we first eliminate the noise variance variable by linear transform and utilize the covariance fitting criteria to determine the regularization parameter for insuring robustness. And then we reconstruct the low-rank covariance matrix by iteratively reweighted nuclear norm optimization that imposes the nonconvex penalty. Finally, we exploit the search-free DoA estimation method to perform the parameter estimation. Numerical simulations are carried out to verify the effectiveness of the proposed method. Moreover, results indicate that the proposed method has more accurate DoA estimation in the nonuniform noise and off-grid cases compared with the state-of-the-art DoA estimation algorithm.

Download Full-text

An efficient Bayesian network for differential diagnosis using experts' knowledge

International Journal of Intelligent Computing and Cybernetics ◽

10.1108/ijicc-10-2019-0112 ◽

2020 ◽

Vol 13 (1) ◽

pp. 103-126 ◽

Cited By ~ 1

Author(s):

Mohammad Mahdi Ershadi ◽

Abbas Seifi

Keyword(s):

Differential Diagnosis ◽

Bayesian Networks ◽

Bayesian Network ◽

Computation Time ◽

Feature Reduction ◽

Classification Methods ◽

Clustering Methods ◽

Content Type ◽

Average Accuracy ◽

Improvement Method

PurposeThis study aims to differential diagnosis of some diseases using classification methods to support effective medical treatment. For this purpose, different classification methods based on data, experts’ knowledge and both are considered in some cases. Besides, feature reduction and some clustering methods are used to improve their performance.Design/methodology/approachFirst, the performances of classification methods are evaluated for differential diagnosis of different diseases. Then, experts' knowledge is utilized to modify the Bayesian networks' structures. Analyses of the results show that using experts' knowledge is more effective than other algorithms for increasing the accuracy of Bayesian network classification. A total of ten different diseases are used for testing, taken from the Machine Learning Repository datasets of the University of California at Irvine (UCI).FindingsThe proposed method improves both the computation time and accuracy of the classification methods used in this paper. Bayesian networks based on experts' knowledge achieve a maximum average accuracy of 87 percent, with a minimum standard deviation average of 0.04 over the sample datasets among all classification methods.Practical implicationsThe proposed methodology can be applied to perform disease differential diagnosis analysis.Originality/valueThis study presents the usefulness of experts' knowledge in the diagnosis while proposing an adopted improvement method for classifications. Besides, the Bayesian network based on experts' knowledge is useful for different diseases neglected by previous papers.

Download Full-text

A Hybrid Classification Approach Based on Decision Tree and Naïve Bays Methods

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2014100104 ◽

2014 ◽

Vol 4 (4) ◽

pp. 61-72

Author(s):

Saed A. Muqasqas ◽

Qasem A. Al Radaideh ◽

Bilal A. Abul-Huda

Keyword(s):

Data Mining ◽

Decision Tree ◽

Classification Accuracy ◽

Classification Methods ◽

Hybrid Classifier ◽

Classification Approach ◽

Classification Technique ◽

Average Accuracy ◽

Proposed Model ◽

Hybrid Classification

Data classification as one of the main tasks of data mining has an important role in many fields. Classification techniques differ mainly in the accuracy of their models, which depends on the method adopted during the learning phase. Several researchers attempted to enhance the classification accuracy by combining different classification methods in the same learning process; resulting in a hybrid-based classifier. In this paper, the authors propose and build a hybrid classifier technique based on Naïve Bayes and C4.5 classifiers. The main goal of the proposed model is to reduce the complexity of the NBTree technique, which is a well known hybrid classification technique, and to improve the overall classification accuracy. Thirty six samples of UCI datasets were used in evaluation. Results have shown that the proposed technique significantly outperforms the NBTree technique and some other classifiers proposed in the literature in term of classification accuracy. The proposed classification approach yields an overall average accuracy equal to 85.70% over the 36 datasets.

Download Full-text

A Review of Malware Classification Methods using Machine Learning

SSRN Electronic Journal ◽

10.2139/ssrn.3769906 ◽

2021 ◽

Author(s):

Nishidh Singh Shekhawat ◽

Rejo Mathew

Keyword(s):

Machine Learning ◽

Classification Methods ◽

Malware Classification

Download Full-text

Comparison of Classification Algorithms on Household Electricity Consumption Data

10.31224/osf.io/vfmx3 ◽

2020 ◽

Author(s):

Brilian Putra Amiruddin ◽

Evanbill Antonio Kore ◽

Dhiya Aldifa Ulhaq ◽

Auzan Widhatama

Keyword(s):

Logistic Regression ◽

Daily Life ◽

Electricity Consumption ◽

Regression Method ◽

Classification Algorithms ◽

Classification Methods ◽

Average Accuracy ◽

Consumption Data ◽

Usage Patterns ◽

Logistic Regression Method

The pattern of electricity consumption is one thing that is important to be known by a household, so it is essential to identify the type of intensity of electricity usage from the household's daily life. It can help determine how much electricity consumption of equipment so that efforts can be made to optimize electricity consumption further while saving costs. Due to that, the classification algorithms based on supervised learning is used. In this study, we compared several types of classification methods to determine the type of electricity usage patterns in a daily household life on Household Electric Power Consumption data obtained from Kaggle. The classification methods being compared are KNN, SVM, Decision Tree, and Logistic Regression. The accuracy of all methods is analyzed to find which method is best in identifying the intensity of electricity usage. From the results of this study, it was found that the Logistic Regression method was the most accurate in classify ing the type of intensity of electricity consumption with an average accuracy value of 99%.

Download Full-text

Determinant of Covariance Matrix Model Coupled with AdaBoost Classification Algorithm for EEG Seizure Detection

Diagnostics ◽

10.3390/diagnostics12010074 ◽

2021 ◽

Vol 12 (1) ◽

pp. 74

Author(s):

Shahab Abdulla ◽

Mohammed Diykh ◽

Sarmad K. D. Alkhafaji ◽

Jonathan H. Greena ◽

Hanan Al-Hadeethi ◽

...

Keyword(s):

Covariance Matrix ◽

Back Propagation ◽

Epileptic Seizures ◽

Back Propagation Neural Network ◽

Machine Learning Techniques ◽

Eeg Signals ◽

Average Accuracy ◽

Effective Selection ◽

Kolmogorov Smirnov ◽

Eeg Recordings

Experts usually inspect electroencephalogram (EEG) recordings page-by-page in order to identify epileptic seizures, which leads to heavy workloads and is time consuming. However, the efficient extraction and effective selection of informative EEG features is crucial in assisting clinicians to diagnose epilepsy accurately. In this paper, a determinant of covariance matrix (Cov–Det) model is suggested for reducing EEG dimensionality. First, EEG signals are segmented into intervals using a sliding window technique. Then, Cov–Det is applied to each interval. To construct a features vector, a set of statistical features are extracted from each interval. To eliminate redundant features, the Kolmogorov–Smirnov (KST) and Mann–Whitney U (MWUT) tests are integrated, the extracted features ranked based on KST and MWUT metrics, and arithmetic operators are adopted to construe the most pertinent classified features for each pair in the EEG signal group. The selected features are then fed into the proposed AdaBoost Back-Propagation neural network (AB_BP_NN) to effectively classify EEG signals into seizure and free seizure segments. Finally, the AB_BP_NN is compared with several classical machine learning techniques; the results demonstrate that the proposed mode of AB_BP_NN provides insignificant false positive rates, simpler design, and robustness in classifying epileptic signals. Two datasets, the Bern–Barcelona and Bonn datasets, are used for performance evaluation. The proposed technique achieved an average accuracy of 100% and 98.86%, respectively, for the Bern–Barcelona and Bonn datasets, which is considered a noteworthy improvement compared to the current state-of-the-art methods.

Download Full-text

Computer systems that learn: an empirical study of the effect of noise on the performance of three classification methods

Expert Systems with Applications ◽

10.1016/s0957-4174(02)00026-x ◽

2002 ◽

Vol 23 (1) ◽

pp. 39-47 ◽

Cited By ~ 15

Author(s):

James R. Nolan

Keyword(s):

Empirical Study ◽

Computer Systems ◽

Classification Methods

Download Full-text

Jatropha Curcas Disease Identification With Extreme Learning Machine

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v12.i2.pp883-888 ◽

2018 ◽

Vol 12 (2) ◽

pp. 883

Author(s):

Triando Hamonangan Saragih ◽

Diny Melsye Nurul Fajri ◽

Wayan Firdaus Mahmudy ◽

Abdul Latief Abadi ◽

Yusuf Priyo Anggodo

Keyword(s):

Neural Network ◽

Expert Systems ◽

Extreme Learning Machine ◽

Jatropha Curcas ◽

Comparison Method ◽

Disease Identification ◽

Maximum Accuracy ◽

Average Accuracy ◽

Learning Machine ◽

Better Than

<p><span>Jatropha is a plant that has many functions, but this plant can be attacked by various diseases. Expert systems can be applied in identifying so that can help both farmers and extension workers to identify the disease. one of method that can be used is Extreme Learning Machine. Extreme Learning Machine is a method of learning in Neural Network which has a one-time iteration concept in each process. In this study get a maximum accuracy of 66.67% with an average accuracy of 60.61%. This proves the identification using Extreme Learning Machine is better than the comparison method that has been done before.</span></p>

Download Full-text