Literature review and analysis on big data stream classification techniques

B. Srivani; N. Sandhya; B. Padmaja Rani

doi:10.3233/kes-200042

Literature review and analysis on big data stream classification techniques

International Journal of Knowledge-based and Intelligent Engineering Systems ◽

10.3233/kes-200042 ◽

2020 ◽

Vol 24 (3) ◽

pp. 205-215

Author(s):

B. Srivani ◽

N. Sandhya ◽

B. Padmaja Rani

Keyword(s):

Neural Network ◽

Big Data ◽

Deep Learning ◽

Data Classification ◽

Support Vector ◽

Complex Data ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Data Stream Classification ◽

Big Data Classification

Rapid growth in technology and information lead the human to witness the improved growth in velocity, volume of data, and variety. The data in the business organizations demonstrate the development of big data applications. Because of the improving demand of applications, analysis of sophisticated streaming big data tends to become a significant area in data mining. One of the significant aspects of the research is employing deep learning approaches for effective extraction of complex data representations. Accordingly, this survey provides the detailed review of big data classification methodologies, like deep learning based techniques, Convolutional Neural Network (CNN) based techniques, K-Nearest Neighbor (KNN) based techniques, Neural Network (NN) based techniques, fuzzy based techniques, and Support vector based techniques, and so on. Moreover, a detailed study is made by concerning the parameters, like evaluation metrics, implementation tool, employed framework, datasets utilized, adopted classification methods, and accuracy range obtained by various techniques. Eventually, the research gaps and issues of various big data classification schemes are presented.

Download Full-text

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

Optimized Neural Network for Big Data Classification Using MapReduce Approach

Advances in Intelligent Systems and Computing - Applied Information Processing Systems ◽

10.1007/978-981-16-2008-9_41 ◽

2021 ◽

pp. 429-437

Author(s):

Sridhar Gujjeti ◽

Suresh Pabboju

Keyword(s):

Neural Network ◽

Big Data ◽

Data Classification ◽

Big Data Classification

Download Full-text

The Impact of Emotional Music on Active ROI in Patients with Depression Based on Deep Learning: A Task-State fMRI Study

Computational Intelligence and Neuroscience ◽

10.1155/2019/5850830 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14

Author(s):

Renzhou Gui ◽

Tongjie Chen ◽

Han Nie

Keyword(s):

Neural Network ◽

Deep Learning ◽

Correlation Matrix ◽

Nearest Neighbor ◽

Learning Model ◽

Fmri Data ◽

Support Vector ◽

K Nearest Neighbor ◽

The Impact ◽

Deep Learning Model

With the continuous development of science, more and more research results have proved that machine learning is capable of diagnosing and studying the major depressive disorder (MDD) in the brain. We propose a deep learning network with multibranch and local residual feedback, for four different types of functional magnetic resonance imaging (fMRI) data produced by depressed patients and control people under the condition of listening to positive- and negative-emotions music. We use the large convolution kernel of the same size as the correlation matrix to match the features and obtain the results of feature matching of 264 regions of interest (ROIs). Firstly, four-dimensional fMRI data are used to generate the two-dimensional correlation matrix of one person’s brain based on ROIs and then processed by the threshold value which is selected according to the characteristics of complex network and small-world network. After that, the deep learning model in this paper is compared with support vector machine (SVM), logistic regression (LR), k-nearest neighbor (kNN), a common deep neural network (DNN), and a deep convolutional neural network (CNN) for classification. Finally, we further calculate the matched ROIs from the intermediate results of our deep learning model which can help related fields further explore the pathogeny of depression patients.

Download Full-text

Quantum Support Vector Machine for Big Data Classification

Physical Review Letters ◽

10.1103/physrevlett.113.130503 ◽

2014 ◽

Vol 113 (13) ◽

Cited By ~ 323

Author(s):

Patrick Rebentrost ◽

Masoud Mohseni ◽

Seth Lloyd

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Data Classification ◽

Support Vector ◽

Big Data Classification

Download Full-text

A Deep Learning Approach for Classification and Diagnosis of Parkinson’s Disease

10.21203/rs.3.rs-254647/v1 ◽

2021 ◽

Author(s):

Monika Jyotiyana ◽

Nishtha Kesswani ◽

Munish Kumar

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Deep Learning ◽

Nearest Neighbor ◽

Health Sector ◽

Classification Model ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Linear Discriminant

Abstract Deep learning techniques are playing an important role in the classification and prediction of diseases. Undoubtedly deep learning has a promising future in the health sector, especially in medical imaging. The popularity of deep learning approaches is because of their ability to handle a large amount of data related to the patients with accuracy, reliability in a short span of time. However, the practitioners may take time in analyzing and generating reports. In this paper, we have proposed a Deep Neural Network-based classification model for Parkinson’s disease. Our proposed method is one such good example giving faster and more accurate results for the classification of Parkinson’s disease patients with excellent accuracy of 94.87%. Based on the attributes of the dataset of the patient, the model can be used for the identification of Parkinsonism's. We have also compared the results with other existing approaches like Linear Discriminant Analysis, Support Vector Machine, K-Nearest Neighbor, Decision Tree, Classification and Regression Trees, Random Forest, Linear Regression, Logistic Regression, Multi-Layer Perceptron, and Naive Bayes.

Download Full-text

Deep Learning Based Sentiment Classification on User-Generated Big Data

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190409152308 ◽

2020 ◽

Vol 13 (5) ◽

pp. 1047-1056

Author(s):

Akshi Kumar ◽

Arunima Jaiswal

Keyword(s):

Neural Network ◽

Big Data ◽

Deep Learning ◽

Soft Computing ◽

Sentiment Classification ◽

Convolution Neural Network ◽

Support Vector ◽

Computing Technique ◽

Proposed Model ◽

Soft Computing Techniques

Background: Sentiment analysis of big data such as Twitter primarily aids the organizations with the potential of surveying public opinions or emotions for the products and events associated with them. Objective: In this paper, we propose the application of a deep learning architecture namely the Convolution Neural Network. The proposed model is implemented on benchmark Twitter corpus (SemEval 2016 and SemEval 2017) and empirically analyzed with other baseline supervised soft computing techniques. The pragmatics of the work includes modelling the behavior of trained Convolution Neural Network on wellknown Twitter datasets for sentiment classification. The performance efficacy of the proposed model has been compared and contrasted with the existing soft computing techniques like Naïve Bayesian, Support Vector Machines, k-Nearest Neighbor, Multilayer Perceptron and Decision Tree using precision, accuracy, recall, and F-measure as key performance indicators. Methods: Majority of the studies emphasize on the utilization of feature mining using lexical or syntactic feature extraction that are often unequivocally articulated through words, emoticons and exclamation marks. Subsequently, CNN, a deep learning based soft computing technique is used to improve the sentiment classifier’s performance. Results: The empirical analysis validates that the proposed implementation of the CNN model outperforms the baseline supervised learning algorithms with an accuracy of around 87% to 88%. Conclusion: Statistical analysis validates that the proposed CNN model outperforms the existing techniques and thus can enhance the performance of sentiment classification viability and coherency.

Download Full-text

A Map Reduce based Support Vector Machine for Big Data Classification

International Journal of Database Theory and Application ◽

10.14257/ijdta.2015.8.5.07 ◽

2015 ◽

Vol 8 (5) ◽

pp. 77-98 ◽

Cited By ~ 21

Author(s):

Anushree Priyadarshini ◽

Sonali Agarwal

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Data Classification ◽

Map Reduce ◽

Support Vector ◽

Big Data Classification

Download Full-text

A STUDY ON THE ERROR OF DISTRIBUTED ALGORITHMS FOR BIG DATA CLASSIFICATION WITH SVM

The ANZIAM Journal ◽

10.1017/s1446181116000390 ◽

2017 ◽

Vol 58 (3-4) ◽

pp. 231-237

Author(s):

CHENG WANG ◽

FEILONG CAO

Keyword(s):

Support Vector Machine ◽

Big Data ◽

Distributed Algorithm ◽

Data Classification ◽

Classification Error ◽

Support Vector ◽

Data Sets ◽

Gaussian Kernels ◽

Big Data Classification ◽

The Given

The error of a distributed algorithm for big data classification with a support vector machine (SVM) is analysed in this paper. First, the given big data sets are divided into small subsets, on which the classical SVM with Gaussian kernels is used. Then, the classification error of the SVM for each subset is analysed based on the Tsybakov exponent, geometric noise, and width of the Gaussian kernels. Finally, the whole error of the distributed algorithm is estimated in terms of the error of each subset.

Download Full-text

A Novel Big Data Classification Algorithm Based on Backpropagation Neural Network

Revue d intelligence artificielle ◽

10.18280/ria.340211 ◽

2020 ◽

Vol 34 (2) ◽

pp. 203-208

Author(s):

Weixiang Jiang

Keyword(s):

Neural Network ◽

Big Data ◽

Data Classification ◽

Classification Algorithm ◽

Backpropagation Neural Network ◽

Big Data Classification

Download Full-text