Online Active Learning Paired Ensemble for Concept Drift and Class Imbalance

Deep learning framework for handling concept drift and class imbalanced complex decision-making on streaming data

Complex & Intelligent Systems ◽

10.1007/s40747-021-00456-0 ◽

2021 ◽

Author(s):

S. Priya ◽

R. Annie Uthra

Keyword(s):

Decision Making ◽

Deep Learning ◽

Concept Drift ◽

Class Imbalance ◽

Streaming Data ◽

Superior Performance ◽

Data Streaming ◽

Minority Class ◽

Concept Drift Detection

AbstractIn present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.

Download Full-text

A two-stage clustering-based cold-start method for active learning

Intelligent Data Analysis ◽

10.3233/ida-205393 ◽

2021 ◽

Vol 25 (5) ◽

pp. 1169-1185

Author(s):

Deniu He ◽

Hong Yu ◽

Guoyin Wang ◽

Jie Li

Keyword(s):

Active Learning ◽

Class Imbalance ◽

Imbalanced Data ◽

Cold Start ◽

Classification Performance ◽

The Novel ◽

Two Stage ◽

Minority Class ◽

Novel Method ◽

Multiple Clusters

The problem of initialization of active learning is considered in this paper. Especially, this paper studies the problem in an imbalanced data scenario, which is called as class-imbalance active learning cold-start. The novel method is two-stage clustering-based active learning cold-start (ALCS). In the first stage, to separate the instances of minority class from that of majority class, a multi-center clustering is constructed based on a new inter-cluster tightness measure, thus the data is grouped into multiple clusters. Then, in the second stage, the initial training instances are selected from each cluster based on an adaptive candidate representative instances determination mechanism and a clusters-cyclic instance query mechanism. The comprehensive experiments demonstrate the effectiveness of the proposed method from the aspects of class coverage, classification performance, and impact on active learning.

Download Full-text

Reducing class imbalance during active learning for named entity annotation

Proceedings of the fifth international conference on Knowledge capture - K-CAP '09 ◽

10.1145/1597735.1597754 ◽

2009 ◽

Cited By ~ 14

Author(s):

Katrin Tomanek ◽

Udo Hahn

Keyword(s):

Active Learning ◽

Class Imbalance ◽

Named Entity ◽

Entity Annotation

Download Full-text

Learning in the presence of class imbalance and concept drift

Neurocomputing ◽

10.1016/j.neucom.2019.01.080 ◽

2019 ◽

Vol 343 ◽

pp. 1-2 ◽

Cited By ~ 1

Author(s):

Shuo Wang ◽

Leandro L. Minku ◽

Nitesh Chawla ◽

Xin Yao

Keyword(s):

Concept Drift ◽

Class Imbalance

Download Full-text

Mining Massive E-Health Data Streams for IoMT Enabled Healthcare Systems

Sensors ◽

10.3390/s20072131 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2131 ◽

Cited By ~ 3

Author(s):

Affan Ahmed Toor ◽

Muhammad Usman ◽

Farah Younas ◽

Alvis Cheuk M. Fong ◽

Sajid Ali Khan ◽

...

Keyword(s):

Data Streams ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Health Data ◽

Smart Devices ◽

Detection Delay ◽

Medical Sensors ◽

Synthetic Datasets ◽

Almost All

With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.

Download Full-text

A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

The Scientific World JOURNAL ◽

10.1155/2014/497354 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Ge Song ◽

Yunming Ye

Keyword(s):

Large Scale ◽

State Of The Art ◽

Concept Drift ◽

Real Life ◽

Class Imbalance ◽

High Dimensional ◽

Adaptive Selection ◽

Stream Classification ◽

Rare Class

Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.

Download Full-text

Sequential extreme learning machines for class imbalance and concept drift

10.32657/10356/65290 ◽

2015 ◽

Author(s):

Bilal Mirza

Keyword(s):

Concept Drift ◽

Class Imbalance ◽

Extreme Learning Machines ◽

Learning Machines

Download Full-text

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Download Full-text

Properties of a GP active learning framework for streaming data with class imbalance

Proceedings of the Genetic and Evolutionary Computation Conference on - GECCO '17 ◽

10.1145/3071178.3071213 ◽

2017 ◽

Cited By ~ 5

Author(s):

Sara Khanchi ◽

Malcolm I. Heywood ◽

A. Nur Zincir-Heywood

Keyword(s):

Active Learning ◽

Class Imbalance ◽

Streaming Data ◽

Learning Framework

Download Full-text

Minority Resampling Based Ensemble Framework Using Enhanced Early Drift Detection Method For Imbalanced Data Streams

10.21203/rs.3.rs-141880/v1 ◽

2021 ◽

Author(s):

Priya S ◽

Annie Uthra

Keyword(s):

Data Streams ◽

Data Stream ◽

Detection Method ◽

Concept Drift ◽

Class Imbalance ◽

Current Data ◽

Classification Model ◽

Ensemble Classifiers ◽

K Nearest Neighbor ◽

Jaccard Similarity

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.

Download Full-text