Classification of the drifting data streams using heterogeneous diversified dynamic class-weighted ensemble

PeerJ Computer Science ◽

10.7717/peerj-cs.459 ◽

2021 ◽

Vol 7 ◽

pp. e459

Author(s):

Martin Sarnovsky ◽

Michal Kolarik

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Methods ◽

Predictive Performance ◽

Streaming Data ◽

Underlying Structure ◽

Adaptive Models ◽

Resource Requirements ◽

Continuous Stream

Data streams can be defined as the continuous stream of data coming from different sources and in different forms. Streams are often very dynamic, and its underlying structure usually changes over time, which may result to a phenomenon called concept drift. When solving predictive problems using the streaming data, traditional machine learning models trained on historical data may become invalid when such changes occur. Adaptive models equipped with mechanisms to reflect the changes in the data proved to be suitable to handle drifting streams. Adaptive ensemble models represent a popular group of these methods used in classification of drifting data streams. In this paper, we present the heterogeneous adaptive ensemble model for the data streams classification, which utilizes the dynamic class weighting scheme and a mechanism to maintain the diversity of the ensemble members. Our main objective was to design a model consisting of a heterogeneous group of base learners (Naive Bayes, k-NN, Decision trees), with adaptive mechanism which besides the performance of the members also takes into an account the diversity of the ensemble. The model was experimentally evaluated on both real-world and synthetic datasets. We compared the presented model with other existing adaptive ensemble methods, both from the perspective of predictive performance and computational resource requirements.

Download Full-text

Decision Tree Classification Algorithm within Concept Similarity

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.235.9 ◽

2012 ◽

Vol 235 ◽

pp. 9-14

Author(s):

Chun Hua Ju ◽

Li Li Mao

Keyword(s):

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Classification Algorithm ◽

Streaming Data ◽

Decision Tree Classification ◽

The Cost ◽

Prediction Efficiency ◽

Concept Similarity

Data stream mining has been applied in many domains, but the concept drifts of data streams bring great obstacles to data mining. Current researches about classification algorithm for streaming data with concept drift have achieved many successes, while they pay little attention to the iterancy of data streams, namely, the situation of the historical concept reappears. For this characteristic, this paper puts forward that it utilizes the classifier model of the historical concepts or high similarity concepts through calculating the concept similarity to classify and predict. In this way, we don’t need training any more. Meanwhile, it reduces the cost of update model, speeds up the classification of the rate and improves the prediction efficiency.

Download Full-text

Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data

Complex & Intelligent Systems ◽

10.1007/s40747-021-00456-0 ◽

2021 ◽

Author(s):

S. Priya ◽

R. Annie Uthra

Keyword(s):

Decision Making ◽

Deep Learning ◽

Concept Drift ◽

Class Imbalance ◽

Streaming Data ◽

Superior Performance ◽

Data Streaming ◽

Minority Class ◽

Concept Drift Detection

AbstractIn present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.

Download Full-text

Scalable real-time classification of data streams with concept drift

Future Generation Computer Systems ◽

10.1016/j.future.2017.03.026 ◽

2017 ◽

Vol 75 ◽

pp. 187-199 ◽

Cited By ~ 35

Author(s):

Mark Tennant ◽

Frederic Stahl ◽

Omer Rana ◽

João Bártolo Gomes

Keyword(s):

Real Time ◽

Data Streams ◽

Concept Drift ◽

Real Time Classification

Download Full-text

Diversity in Ensemble Model for Classification of Data Streams with Concept Drift

2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI) ◽

10.1109/sami50585.2021.9378625 ◽

2021 ◽

Author(s):

Michal Kolarik ◽

Martin Sarnovsky ◽

Jan Paralic

Keyword(s):

Data Streams ◽

Concept Drift ◽

Ensemble Model

Download Full-text

Adaptive Bagging Methods for Classification of Data Streams with Concept Drift

Acta Polytechnica Hungarica ◽

10.12700/aph.18.3.2021.3.3 ◽

2021 ◽

Vol 18 (3) ◽

pp. 47-63

Author(s):

Martin Sarnovsky ◽

Jan Marcinko

Keyword(s):

Data Streams ◽

Concept Drift

Download Full-text

The GC3 framework : grid density based clustering for classification of streaming data with concept drift.

10.18297/etd/1300 ◽

2013 ◽

Cited By ~ 1

Author(s):

Tegjyot Sethi

Keyword(s):

Concept Drift ◽

Streaming Data ◽

Density Based Clustering

Download Full-text

Classification of Imbalanced Data Stream: Techniques and Challenges

Transactions on Machine Learning and Artificial Intelligence ◽

10.14738/tmlai.92.9964 ◽

2021 ◽

Vol 9 (2) ◽

pp. 36-52

Author(s):

Mashaal A. Alfhaid ◽

Manal Abdullah

Keyword(s):

Data Mining ◽

Data Stream ◽

Concept Drift ◽

Class Imbalance ◽

Imbalanced Data ◽

Predictive Performance ◽

Knowledge Extraction ◽

Streaming Data ◽

Stream Data ◽

Stream Data Mining

As the number of generated data increases every day, this has brought the importance of data mining and knowledge extraction. In traditional data mining, offline status can be used for knowledge extraction. Nevertheless, dealing with stream data mining is different due to continuously arriving data that can be processed at a single scan besides the appearance of concept drift. As the pre-processing stage is critical in knowledge extraction, imbalanced stream data gain significant popularity in the last few years among researchers. Many real-world applications suffer from class imbalance including medical, business, fraud detection and etc. Learning from the supervised model includes classes whether it is binary- or multi-classes. These classes are often imbalance where it is divided into the majority (negative) class and minority (positive) class, which can cause a bias toward the majority class that leads to skew in predictive performance models. Handles imbalance streaming data is mandatory for more accurate and reliable learning models. In this paper, we will present an overview of data stream mining and its tools. Besides, summarize the problem of class imbalance and its different approaches. In addition, researchers will present the popular evaluation metrics and challenges prone from imbalanced streaming data.

Download Full-text

Concept Drift and Evolution Detection in Fusion Diagnosis With Evolving Data Streams

Volume 2A: 43rd Design Automation Conference ◽

10.1115/detc2017-68373 ◽

2017 ◽

Author(s):

Amirmahyar Abdolsamadi ◽

Pingfeng Wang

Keyword(s):

Data Streams ◽

Concept Drift ◽

Data Distribution ◽

Streaming Data ◽

Majority Voting ◽

Classification Model ◽

Engineering System ◽

Concept Evolution ◽

Adaptive Fusion

Health diagnosis interprets data streams acquired by smart sensors and makes inferences about health conditions of an engineering system thereby making critical operational decisions. A data stream is a flow of continuous data that face some challenges in data mining. This paper addresses concept drift and concept evolution as two major challenges in the classification of streaming data. Concept drift occurs as a result of data distribution changes. Concept evolution happens when new classes appear in the stream. These changes may cause the degradation of classification results over time. This paper presents an adaptive fusion learning approach to build a robust classification model. The proposed approach consists of three steps: (i) proposed fusion formulation using weighted majority voting (ii) active learning to labels selectively instead of querying for all true labels (iii) distance-based approach to monitoring the movement of data distribution. A diagnosis case study has been used to demonstrate the developed fusion diagnosis methodology.

Download Full-text

CONCEPT DRIFT IN STREAMING DATA: A SYSTEMATIC LITERATURE REVIEW

KIET Journal of Computing and Information Sciences ◽

10.51153/kjcis.v4i1.43 ◽

2021 ◽

Vol 4 (1) ◽

pp. 17

Author(s):

Tariq Mahmood ◽

Tatheer Fatima

Keyword(s):

Machine Learning ◽

Literature Review ◽

Systematic Literature Review ◽

Data Streams ◽

Concept Drift ◽

Streaming Data ◽

Machine Learning Techniques ◽

Underlying Distribution ◽

Learning Techniques ◽

Real World Datasets

World is generating immeasurable amount of data every minute, that needs to be analyzed for better decision making. In order to fulfil this demand of faster analytics, businesses are adopting efficient stream processing and machine learning techniques. However, data streams are particularly challenging to handle. One of the prominent problems faced while dealing with streaming data is concept drift. Concept drift is described as, an unexpected change in the underlying distribution of the streaming data that can be observed as time passes. In this work, we have conducted a systematic literature review to discover several methods that deal with the problem of concept drift. Most frequently used supervised and unsupervised techniques have been reviewed and we have also surveyed commonly used publicly available artificial and real-world datasets that are used to deal with concept drift issues.

Download Full-text

Classification of Concept Drift Data Streams

2014 International Conference on Information Science & Applications (ICISA) ◽

10.1109/icisa.2014.6847374 ◽

2014 ◽

Cited By ~ 1

Author(s):

E. Padmalatha ◽

C. R. K. Reddy ◽

B. Padmaja Rani

Keyword(s):

Data Streams ◽

Concept Drift

Download Full-text