Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms

Weronika Wegier; Pawel Ksieniewicz

doi:10.3390/e22080849

Application of Imbalanced Data Classification Quality Metrics as Weighting Methods of the Ensemble Data Stream Classification Algorithms

Entropy ◽

10.3390/e22080849 ◽

2020 ◽

Vol 22 (8) ◽

pp. 849

Author(s):

Weronika Wegier ◽

Pawel Ksieniewicz

Keyword(s):

Concept Drift ◽

Binary Classification ◽

Imbalanced Data ◽

Accuracy Score ◽

Data Sampling ◽

Stream Classification ◽

Data Stream Classification ◽

Imbalanced Data Classification ◽

The Impact

In the era of a large number of tools and applications that constantly produce massive amounts of data, their processing and proper classification is becoming both increasingly hard and important. This task is hindered by changing the distribution of data over time, called the concept drift, and the emergence of a problem of disproportion between classes—such as in the detection of network attacks or fraud detection problems. In the following work, we propose methods to modify existing stream processing solutions—Accuracy Weighted Ensemble (AWE) and Accuracy Updated Ensemble (AUE), which have demonstrated their effectiveness in adapting to time-varying class distribution. The introduced changes are aimed at increasing their quality on binary classification of imbalanced data. The proposed modifications contain the inclusion of aggregate metrics, such as F1-score, G-mean and balanced accuracy score in calculation of the member classifiers weights, which affects their composition and final prediction. Moreover, the impact of data sampling on the algorithm’s effectiveness was also checked. Complex experiments were conducted to define the most promising modification type, as well as to compare proposed methods with existing solutions. Experimental evaluation shows an improvement in the quality of classification compared to the underlying algorithms and other solutions for processing imbalanced data streams.

Download Full-text

Analyzing and repairing concept drift adaptation in data stream classification

Machine Learning ◽

10.1007/s10994-021-05993-w ◽

2021 ◽

Author(s):

Ben Halstead ◽

Yun Sing Koh ◽

Patricia Riddle ◽

Russel Pears ◽

Mykola Pechenizkiy ◽

...

Keyword(s):

Data Stream ◽

Concept Drift ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

Data Preprocessing and Dynamic Ensemble Selection for Imbalanced Data Stream Classification

Machine Learning and Knowledge Discovery in Databases - Communications in Computer and Information Science ◽

10.1007/978-3-030-43887-6_30 ◽

2020 ◽

pp. 367-379

Author(s):

Paweł Zyblewski ◽

Robert Sabourin ◽

Michał Woźniak

Keyword(s):

Data Stream ◽

Imbalanced Data ◽

Data Preprocessing ◽

Stream Classification ◽

Data Stream Classification ◽

Ensemble Selection ◽

Selection For ◽

Dynamic Ensemble Selection

Download Full-text

Heuristic ensemble for unsupervised detection of multiple types of concept drift in data stream classification

Intelligent Decision Technologies ◽

10.3233/idt-210115 ◽

2021 ◽

pp. 1-14

Author(s):

Hanqing Hu ◽

Mehmed Kantardzic

Keyword(s):

Data Stream ◽

Concept Drift ◽

False Alarms ◽

Detection Accuracy ◽

Real World Data ◽

Traditional Concept ◽

Stream Classification ◽

Data Stream Classification ◽

Detection Algorithms ◽

Concept Drift Detection

Real-world data stream classification often deals with multiple types of concept drift, categorized by change characteristics such as speed, distribution, and severity. When labels are unavailable, traditional concept drift detection algorithms, used in stream classification frameworks, are often focused on only one type of concept drift. To overcome the limitations of traditional detection algorithms, this study proposed a Heuristic Ensemble Framework for Drift Detection (HEFDD). HEFDD aims to detect all types of concept drift by employing an ensemble of selected concept drift detection algorithms, each capable of detecting at least one type of concept drift. Experimental results show HEFDD provides significant improvement based on the z-score test when comparing detection accuracy with state-of-the-art individual algorithms. At the same time, HEFDD is able to reduce false alarms generated by individual concept drift detection algorithms.

Download Full-text

Dynamic Ensemble Selection for Imbalanced Data Stream Classification with Limited Label Access

10.1007/978-3-030-87897-9_20 ◽

2021 ◽

pp. 217-226

Author(s):

Paweł Zyblewski ◽

Michał Woźniak

Keyword(s):

Data Stream ◽

Imbalanced Data ◽

Stream Classification ◽

Data Stream Classification ◽

Ensemble Selection ◽

Selection For ◽

Dynamic Ensemble Selection

Download Full-text

Data Stream Mining Using Ensemble Classifier

Collaborative Filtering Using Data Mining and Analysis - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0489-4.ch013 ◽

2017 ◽

pp. 236-249

Author(s):

Snehlata Sewakdas Dongre ◽

Latesh G. Malik

Keyword(s):

Collaborative Filtering ◽

Data Stream ◽

Concept Drift ◽

Ensemble Classifier ◽

Ensemble Classification ◽

Data Stream Mining ◽

Main Concern ◽

Stream Mining ◽

Stream Classification ◽

Data Stream Classification

A data stream is giant amount of data which is generated uncontrollably at a rapid rate from many applications like call detail records, log records, sensors applications etc. Data stream mining has grasped the attention of so many researchers. A rising problem in Data Streams is the handling of concept drift. To be a good algorithm it should adapt the changes and handle the concept drift properly. Ensemble classification method is the group of classifiers which works in collaborative manner. Overall this chapter will cover all the aspects of the data stream classification. The mission of this chapter is to discuss various techniques which use collaborative filtering for the data stream mining. The main concern of this chapter is to make reader familiar with the data stream domain and data stream mining. Instead of single classifier the group of classifiers is used to enhance the accuracy of classification. The collaborative filtering will play important role here how the different classifiers work collaborative within the ensemble to achieve a goal.

Download Full-text

Employing One-Class SVM Classifier Ensemble for Imbalanced Data Stream Classification

Lecture Notes in Computer Science - Computational Science – ICCS 2020 ◽

10.1007/978-3-030-50423-6_9 ◽

2020 ◽

pp. 117-127

Author(s):

Jakub Klikowski ◽

Michał Woźniak

Keyword(s):

Data Stream ◽

Imbalanced Data ◽

Classifier Ensemble ◽

Svm Classifier ◽

Stream Classification ◽

Data Stream Classification

Download Full-text

Approbation of Approaches to Assessment of ESG Risks of Russian Companies at the Regional Level

Federalism ◽

10.21686/2073-1051-2021-2-25-42 ◽

2021 ◽

pp. 25-42

Author(s):

Е. S. Emelyanova ◽

L А. Vasiliev

Keyword(s):

Russian Federation ◽

Sustainability Assessment ◽

Corporate Sustainability ◽

Federal District ◽

Environmental Risks ◽

Regional Level ◽

Data Sampling ◽

The Russian Federation ◽

The Impact

Climate change is a serious, widespread threat and requires an urgent global response, including in the management of environmental risks. The authors of the study, the results of which are presented in this article, first of all, set the task of assessing the ESG risks of Russian companies at the regional level. The methodology of the study was based only on the analysis of environmental risks due to the historical features of the development of the Russian Federation, the approach of S&P Global in the field of corporate sustainability assessment was adopted, and then the initial values of the companies ‘ E-ratings were adjusted using the impact map developed by the United Nations. To test the proposed approach to assessing environmental risks, we used a sample of data on types of economic activity in the context of the subjects of the Russian Federation. As a result of applying the proposed approach to data sampling, companies were assigned to one of the three categories that determine their exposure to E-risk: high, moderate, and low environmental risk. The E-risk exposure was also assessed based on the company’s regional affiliation to the relevant federal district. The main conclusion of the study was the confirmation of the need to improve the quality of analytics of companies ‘ exposure to environmental risks, the need for more detailed information disclosed by companies.

Download Full-text

Cost-Sensitive Classification for Evolving Data Streams with Concept Drift and Class Imbalance

Computational Intelligence and Neuroscience ◽

10.1155/2021/8813806 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yange Sun ◽

Meng Li ◽

Lei Li ◽

Han Shao ◽

Yi Sun

Keyword(s):

Data Streams ◽

Data Stream ◽

Learning Strategy ◽

Concept Drift ◽

Class Imbalance ◽

Data Preprocessing ◽

Cost Information ◽

Detection Mechanism ◽

Stream Classification ◽

Data Stream Classification

Class imbalance and concept drift are two primary principles that exist concurrently in data stream classification. Although the two issues have drawn enough attention separately, the joint treatment largely remains unexplored. Moreover, the class imbalance issue is further complicated if data streams with concept drift. A novel Cost-Sensitive based Data Stream (CSDS) classification is introduced to overcome the two issues simultaneously. The CSDS considers cost information during the procedures of data preprocessing and classification. During the data preprocessing, a cost-sensitive learning strategy is introduced into the ReliefF algorithm for alleviating the class imbalance at the data level. In the classification process, a cost-sensitive weighting schema is devised to enhance the overall performance of the ensemble. Besides, a change detection mechanism is embedded in our algorithm, which guarantees that an ensemble can capture and react to drift promptly. Experimental results validate that our method can obtain better classification results under different imbalanced concept drifting data stream scenarios.

Download Full-text

Microcluster-Based Incremental Ensemble Learning for Noisy, Nonstationary Data Streams

Complexity ◽

10.1155/2020/6147378 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Sanmin Liu ◽

Shan Xue ◽

Fanzhen Liu ◽

Jieren Cheng ◽

Xiulai Li ◽

...

Keyword(s):

Ensemble Learning ◽

Data Streams ◽

Data Stream ◽

Concept Drift ◽

Majority Vote ◽

Stream Classification ◽

Model Stability ◽

Data Stream Classification ◽

Nonstationary Data ◽

Synthetic Datasets

Data stream classification becomes a promising prediction work with relevance to many practical environments. However, under the environment of concept drift and noise, the research of data stream classification faces lots of challenges. Hence, a new incremental ensemble model is presented for classifying nonstationary data streams with noise. Our approach integrates three strategies: incremental learning to monitor and adapt to concept drift; ensemble learning to improve model stability; and a microclustering procedure that distinguishes drift from noise and predicts the labels of incoming instances via majority vote. Experiments with two synthetic datasets designed to test for both gradual and abrupt drift show that our method provides more accurate classification in nonstationary data streams with noise than the two popular baselines.

Download Full-text

PGNBC: Pearson Gaussian Naïve Bayes classifier for data stream classification with recurring concept drift

Intelligent Data Analysis ◽

10.3233/ida-163020 ◽

2017 ◽

Vol 21 (5) ◽

pp. 1173-1191 ◽

Cited By ~ 3

Author(s):

D. Kishore Babu ◽

Y. Ramadevi ◽

K.V. Ramana

Keyword(s):

Data Stream ◽

Naive Bayes ◽

Concept Drift ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Stream Classification ◽

Data Stream Classification

Download Full-text