Literature review and analysis on big data stream classification techniques

Author(s):  
B. Srivani ◽  
N. Sandhya ◽  
B. Padmaja Rani

Rapid growth in technology and information lead the human to witness the improved growth in velocity, volume of data, and variety. The data in the business organizations demonstrate the development of big data applications. Because of the improving demand of applications, analysis of sophisticated streaming big data tends to become a significant area in data mining. One of the significant aspects of the research is employing deep learning approaches for effective extraction of complex data representations. Accordingly, this survey provides the detailed review of big data classification methodologies, like deep learning based techniques, Convolutional Neural Network (CNN) based techniques, K-Nearest Neighbor (KNN) based techniques, Neural Network (NN) based techniques, fuzzy based techniques, and Support vector based techniques, and so on. Moreover, a detailed study is made by concerning the parameters, like evaluation metrics, implementation tool, employed framework, datasets utilized, adopted classification methods, and accuracy range obtained by various techniques. Eventually, the research gaps and issues of various big data classification schemes are presented.

Electronics ◽  
2021 ◽  
Vol 10 (14) ◽  
pp. 1694
Author(s):  
Mathew Ashik ◽  
A. Jyothish ◽  
S. Anandaram ◽  
P. Vinod ◽  
Francesco Mercaldo ◽  
...  

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3068
Author(s):  
Soumaya Dghim ◽  
Carlos M. Travieso-González ◽  
Radim Burget

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.


2019 ◽  
Vol 2019 ◽  
pp. 1-14
Author(s):  
Renzhou Gui ◽  
Tongjie Chen ◽  
Han Nie

With the continuous development of science, more and more research results have proved that machine learning is capable of diagnosing and studying the major depressive disorder (MDD) in the brain. We propose a deep learning network with multibranch and local residual feedback, for four different types of functional magnetic resonance imaging (fMRI) data produced by depressed patients and control people under the condition of listening to positive- and negative-emotions music. We use the large convolution kernel of the same size as the correlation matrix to match the features and obtain the results of feature matching of 264 regions of interest (ROIs). Firstly, four-dimensional fMRI data are used to generate the two-dimensional correlation matrix of one person’s brain based on ROIs and then processed by the threshold value which is selected according to the characteristics of complex network and small-world network. After that, the deep learning model in this paper is compared with support vector machine (SVM), logistic regression (LR), k-nearest neighbor (kNN), a common deep neural network (DNN), and a deep convolutional neural network (CNN) for classification. Finally, we further calculate the matched ROIs from the intermediate results of our deep learning model which can help related fields further explore the pathogeny of depression patients.


2021 ◽  
Author(s):  
Monika Jyotiyana ◽  
Nishtha Kesswani ◽  
Munish Kumar

Abstract Deep learning techniques are playing an important role in the classification and prediction of diseases. Undoubtedly deep learning has a promising future in the health sector, especially in medical imaging. The popularity of deep learning approaches is because of their ability to handle a large amount of data related to the patients with accuracy, reliability in a short span of time. However, the practitioners may take time in analyzing and generating reports. In this paper, we have proposed a Deep Neural Network-based classification model for Parkinson’s disease. Our proposed method is one such good example giving faster and more accurate results for the classification of Parkinson’s disease patients with excellent accuracy of 94.87%. Based on the attributes of the dataset of the patient, the model can be used for the identification of Parkinsonism's. We have also compared the results with other existing approaches like Linear Discriminant Analysis, Support Vector Machine, K-Nearest Neighbor, Decision Tree, Classification and Regression Trees, Random Forest, Linear Regression, Logistic Regression, Multi-Layer Perceptron, and Naive Bayes.


2020 ◽  
Vol 13 (5) ◽  
pp. 1047-1056
Author(s):  
Akshi Kumar ◽  
Arunima Jaiswal

Background: Sentiment analysis of big data such as Twitter primarily aids the organizations with the potential of surveying public opinions or emotions for the products and events associated with them. Objective: In this paper, we propose the application of a deep learning architecture namely the Convolution Neural Network. The proposed model is implemented on benchmark Twitter corpus (SemEval 2016 and SemEval 2017) and empirically analyzed with other baseline supervised soft computing techniques. The pragmatics of the work includes modelling the behavior of trained Convolution Neural Network on wellknown Twitter datasets for sentiment classification. The performance efficacy of the proposed model has been compared and contrasted with the existing soft computing techniques like Naïve Bayesian, Support Vector Machines, k-Nearest Neighbor, Multilayer Perceptron and Decision Tree using precision, accuracy, recall, and F-measure as key performance indicators. Methods: Majority of the studies emphasize on the utilization of feature mining using lexical or syntactic feature extraction that are often unequivocally articulated through words, emoticons and exclamation marks. Subsequently, CNN, a deep learning based soft computing technique is used to improve the sentiment classifier’s performance. Results: The empirical analysis validates that the proposed implementation of the CNN model outperforms the baseline supervised learning algorithms with an accuracy of around 87% to 88%. Conclusion: Statistical analysis validates that the proposed CNN model outperforms the existing techniques and thus can enhance the performance of sentiment classification viability and coherency.


2017 ◽  
Vol 58 (3-4) ◽  
pp. 231-237
Author(s):  
CHENG WANG ◽  
FEILONG CAO

The error of a distributed algorithm for big data classification with a support vector machine (SVM) is analysed in this paper. First, the given big data sets are divided into small subsets, on which the classical SVM with Gaussian kernels is used. Then, the classification error of the SVM for each subset is analysed based on the Tsybakov exponent, geometric noise, and width of the Gaussian kernels. Finally, the whole error of the distributed algorithm is estimated in terms of the error of each subset.


Sign in / Sign up

Export Citation Format

Share Document