scholarly journals Online Active Learning Paired Ensemble for Concept Drift and Class Imbalance

IEEE Access ◽  
2018 ◽  
Vol 6 ◽  
pp. 73815-73828 ◽  
Author(s):  
Hang Zhang ◽  
Weike Liu ◽  
Jicheng Shan ◽  
Qingbao Liu
Author(s):  
S. Priya ◽  
R. Annie Uthra

AbstractIn present times, data science become popular to support and improve decision-making process. Due to the accessibility of a wide application perspective of data streaming, class imbalance and concept drifting become crucial learning problems. The advent of deep learning (DL) models finds useful for the classification of concept drift in data streaming applications. This paper presents an effective class imbalance with concept drift detection (CIDD) using Adadelta optimizer-based deep neural networks (ADODNN), named CIDD-ADODNN model for the classification of highly imbalanced streaming data. The presented model involves four processes namely preprocessing, class imbalance handling, concept drift detection, and classification. The proposed model uses adaptive synthetic (ADASYN) technique for handling class imbalance data, which utilizes a weighted distribution for diverse minority class examples based on the level of difficulty in learning. Next, a drift detection technique called adaptive sliding window (ADWIN) is employed to detect the existence of the concept drift. Besides, ADODNN model is utilized for the classification processes. For increasing the classifier performance of the DNN model, ADO-based hyperparameter tuning process takes place to determine the optimal parameters of the DNN model. The performance of the presented model is evaluated using three streaming datasets namely intrusion detection (NSL KDDCup) dataset, Spam dataset, and Chess dataset. A detailed comparative results analysis takes place and the simulation results verified the superior performance of the presented model by obtaining a maximum accuracy of 0.9592, 0.9320, and 0.7646 on the applied KDDCup, Spam, and Chess dataset, respectively.


2021 ◽  
Vol 25 (5) ◽  
pp. 1169-1185
Author(s):  
Deniu He ◽  
Hong Yu ◽  
Guoyin Wang ◽  
Jie Li

The problem of initialization of active learning is considered in this paper. Especially, this paper studies the problem in an imbalanced data scenario, which is called as class-imbalance active learning cold-start. The novel method is two-stage clustering-based active learning cold-start (ALCS). In the first stage, to separate the instances of minority class from that of majority class, a multi-center clustering is constructed based on a new inter-cluster tightness measure, thus the data is grouped into multiple clusters. Then, in the second stage, the initial training instances are selected from each cluster based on an adaptive candidate representative instances determination mechanism and a clusters-cyclic instance query mechanism. The comprehensive experiments demonstrate the effectiveness of the proposed method from the aspects of class coverage, classification performance, and impact on active learning.


2019 ◽  
Vol 343 ◽  
pp. 1-2 ◽  
Author(s):  
Shuo Wang ◽  
Leandro L. Minku ◽  
Nitesh Chawla ◽  
Xin Yao

Sensors ◽  
2020 ◽  
Vol 20 (7) ◽  
pp. 2131 ◽  
Author(s):  
Affan Ahmed Toor ◽  
Muhammad Usman ◽  
Farah Younas ◽  
Alvis Cheuk M. Fong ◽  
Sajid Ali Khan ◽  
...  

With the increasing popularity of the Internet-of-Medical-Things (IoMT) and smart devices, huge volumes of data streams have been generated. This study aims to address the concept drift, which is a major challenge in the processing of voluminous data streams. Concept drift refers to overtime change in data distribution. It may occur in the medical domain, for example the medical sensors measuring for general healthcare or rehabilitation, which may switch their roles for ICU emergency operations when required. Detecting concept drifts becomes trickier when the class distributions in data are skewed, which is often true for medical sensors e-health data. Reactive Drift Detection Method (RDDM) is an efficient method for detecting long concepts. However, RDDM has a high error rate, and it does not handle class imbalance. We propose an Enhanced Reactive Drift Detection Method (ERDDM), which systematically generates strategies to handle concept drift with class imbalance in data streams. We conducted experiments to compare ERDDM with three contemporary techniques in terms of prediction error, drift detection delay, latency, and ability to handle data imbalance. The experimentation was done in Massive Online Analysis (MOA) on 48 synthetic datasets customized to possess the capabilities of data streams. ERDDM can handle abrupt and gradual drifts and performs better than all benchmarks in almost all experiments.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Ge Song ◽  
Yunming Ye

Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yange Sun ◽  
Meng Li ◽  
Lei Li ◽  
Han Shao ◽  
Yi Sun

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.


2021 ◽  
Author(s):  
Priya S ◽  
Annie Uthra

Abstract As the data mining applications are increasing popularly, large volumes of data streams are generated over the period of time. The main problem in data streams is that it exhibits a high degree of class imbalance and distribution of data changes over time. In this paper, Timely Drift Detection and Minority Resampling Technique (TDDMRT) based on K-nearest neighbor and Jaccard similarity is proposed to handle the class imbalance by finding the current ratio of class labels. The Enhanced Early Drift Detection Method (EEDDM) is proposed for detecting the concept drift and the Minority Resampling Method (KNN-JS) determines whether the current data stream should be regarded as imbalance and it resamples the minority instances in the drifting data stream. The K-Nearest Neighbors technique is used to resample the minority classes and the Jaccard similarity measure is established over the resampled data to generate the synthetic data similar to the original data and it is handled by ensemble classifiers. The proposed ensemble based classification model outperforms the existing over sampling and under sampling techniques with accuracy of 98.52%.


Sign in / Sign up

Export Citation Format

Share Document