Learning from class-imbalanced data: review of data driven methods and algorithm driven methods

Cui Yin Huang;  ; Hong Liang Dai

doi:10.3934/dsfe.2021002

A Cost-Sensitive Diagnosis Method Based on the Operation and Maintenance Data of UAV

Applied Sciences ◽

10.3390/app112311116 ◽

2021 ◽

Vol 11 (23) ◽

pp. 11116

Author(s):

Ke Zheng ◽

Guozhu Jia ◽

Linchao Yang ◽

Chunting Liu

Keyword(s):

Fault Diagnosis ◽

Data Distribution ◽

Imbalanced Data ◽

Data Driven ◽

Total Cost ◽

Operation And Maintenance ◽

Diagnosis Method ◽

Overall Performance ◽

Diagnosis Model ◽

The Cost

In the fault diagnosis of UAVs, extremely imbalanced data distribution and vast differences in effects of fault modes can drastically affect the application effect of a data-driven fault diagnosis model under the limitation of computing resources. At present, there is still no credible approach to determine the cost of the misdiagnosis of different fault modes that accounts for the interference of data distribution. The performance of the original cost-insensitive flight data-driven fault diagnosis models also needs to be improved. In response to this requirement, this paper proposes a two-step ensemble cost-sensitive diagnosis method based on the operation and maintenance data of UAV. According to the fault criticality from FMECA information, we defined a misdiagnosis hazard value and calculated the misdiagnosis cost. By using the misdiagnosis cost, a static cost matrix could be set to modify the diagnosis model and to evaluate the performance of the diagnosis results. A two-step ensemble cost-sensitive method based on the MetaCost framework was proposed using stratified bootstrapping, choosing LightGBM as meta-classifiers, and adjusting the ensemble form to enhance the overall performance of the diagnosis model and reduce the occupation of the computing resources while optimizing the total misdiagnosis cost. The experimental results based on the KPG component data of a large fixed-wing UAV show that the proposed cost-sensitive model can effectively reduce the total cost incurred by misdiagnosis, without putting forward excessive requirements on the computing equipment under the condition of ensuring a certain overall level of diagnosis performance.

Download Full-text

Data-driven fault diagnosis for residential variable refrigerant flow system on imbalanced data environments

International Journal of Refrigeration ◽

10.1016/j.ijrefrig.2021.01.009 ◽

2021 ◽

Vol 125 ◽

pp. 34-43

Author(s):

Zhenxin Zhou ◽

Huanxin Chen ◽

Guannan Li ◽

Hanlu Zhong ◽

Menghua Zhang ◽

...

Keyword(s):

Fault Diagnosis ◽

Flow System ◽

Imbalanced Data ◽

Data Driven

Download Full-text

Data-Driven Interval Type-2 Fuzzy Modelling for the Classification of Imbalanced Data

Studies in Systems, Decision and Control - Practical Issues of Intelligent Innovations ◽

10.1007/978-3-319-78437-3_3 ◽

2018 ◽

pp. 37-51

Author(s):

Adrian Rubio-Solis ◽

Ali Baraka ◽

George Panoutsos ◽

Steve Thornton

Keyword(s):

Imbalanced Data ◽

Data Driven ◽

Fuzzy Modelling ◽

Interval Type

Download Full-text

A Data-Driven Fault Diagnosis Method for Railway Turnouts

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119837222 ◽

2019 ◽

Vol 2673 (4) ◽

pp. 448-457 ◽

Cited By ~ 9

Author(s):

Dongxiu Ou ◽

Rui Xue ◽

Ke Cui

Keyword(s):

Fault Diagnosis ◽

Imbalanced Data ◽

Principal Component ◽

Real Data ◽

Feature Reduction ◽

Data Driven ◽

Support Vector ◽

Linear Discriminant ◽

Railway Line ◽

Diagnosis Method

Turnout systems on railways are crucial for safety protection and improvements in efficiency. The statistics show that the most common faults in railway system are turnout system faults. Therefore, many railway systems have adopted the microcomputer monitoring system (MMS) to monitor their health and performance in real time. However, in practice, existing turnout fault diagnosis methods depend largely on human experience. In this paper, we propose a data-driven fault diagnosis method that monitors data from point machines collected using MMS. First, based on a derivative method, data features are extracted by segmenting the original sample. Then, we apply two methods for feature reduction: principal component analysis (PCA) and linear discriminant analysis (LDA). The results show that LDA gave a better performance in the cases studied. A problem that cannot be overlooked is that the imbalanced quantity of rare fault samples and abundant normal samples will reduce the accuracy of classic fault diagnosis models. To deal with this problem of imbalanced data, we propose a modified support vector machine (SVM) method. Finally, an experiment using real data collected from the Guangzhou Railway Line is presented, which demonstrates that our method is reliable and feasible in fault diagnosis. It can further assist engineers to perform timely repairs and maintenance work in the future.

Download Full-text

Generative Oversampling Method for Imbalanced Data on Bearing Fault Detection and Diagnosis

Applied Sciences ◽

10.3390/app9040746 ◽

2019 ◽

Vol 9 (4) ◽

pp. 746 ◽

Cited By ~ 11

Author(s):

Sungho Suh ◽

Haebom Lee ◽

Jun Jo ◽

Paul Lukowicz ◽

Yong Lee

Keyword(s):

Fault Detection ◽

Test Bench ◽

Imbalanced Data ◽

Fault Detection And Diagnosis ◽

Data Driven ◽

Image Transformation ◽

Vibration Signals ◽

Bearing Fault ◽

Transformation Procedure ◽

Detection And Diagnosis

In this study, we developed a novel data-driven fault detection and diagnosis (FDD) method for bearing faults in induction motors where the fault condition data are imbalanced. First, we propose a bearing fault detector based on convolutional neural networks (CNN), in which the vibration signals from a test bench are used as inputs after an image transformation procedure. Experimental results demonstrate that the proposed classifier for FDD performs well (accuracy of 88% to 99%) even when the volume of normal and fault condition data is imbalanced (imbalance ratio varies from 20:1 to 200:1). Additionally, our generative model reduces the level of data imbalance by oversampling. The results improve the accuracy of FDD (by up to 99%) when a severe imbalance ratio (200:1) is assumed.

Download Full-text