scholarly journals Learning from class-imbalanced data: review of data driven methods and algorithm driven methods

2021 ◽  
Vol 1 (1) ◽  
pp. 21-36
Author(s):  
Cui Yin Huang ◽  
◽  
Hong Liang Dai
Keyword(s):  
2021 ◽  
Vol 11 (23) ◽  
pp. 11116
Author(s):  
Ke Zheng ◽  
Guozhu Jia ◽  
Linchao Yang ◽  
Chunting Liu

In the fault diagnosis of UAVs, extremely imbalanced data distribution and vast differences in effects of fault modes can drastically affect the application effect of a data-driven fault diagnosis model under the limitation of computing resources. At present, there is still no credible approach to determine the cost of the misdiagnosis of different fault modes that accounts for the interference of data distribution. The performance of the original cost-insensitive flight data-driven fault diagnosis models also needs to be improved. In response to this requirement, this paper proposes a two-step ensemble cost-sensitive diagnosis method based on the operation and maintenance data of UAV. According to the fault criticality from FMECA information, we defined a misdiagnosis hazard value and calculated the misdiagnosis cost. By using the misdiagnosis cost, a static cost matrix could be set to modify the diagnosis model and to evaluate the performance of the diagnosis results. A two-step ensemble cost-sensitive method based on the MetaCost framework was proposed using stratified bootstrapping, choosing LightGBM as meta-classifiers, and adjusting the ensemble form to enhance the overall performance of the diagnosis model and reduce the occupation of the computing resources while optimizing the total misdiagnosis cost. The experimental results based on the KPG component data of a large fixed-wing UAV show that the proposed cost-sensitive model can effectively reduce the total cost incurred by misdiagnosis, without putting forward excessive requirements on the computing equipment under the condition of ensuring a certain overall level of diagnosis performance.


2021 ◽  
Vol 125 ◽  
pp. 34-43
Author(s):  
Zhenxin Zhou ◽  
Huanxin Chen ◽  
Guannan Li ◽  
Hanlu Zhong ◽  
Menghua Zhang ◽  
...  

Author(s):  
Dongxiu Ou ◽  
Rui Xue ◽  
Ke Cui

Turnout systems on railways are crucial for safety protection and improvements in efficiency. The statistics show that the most common faults in railway system are turnout system faults. Therefore, many railway systems have adopted the microcomputer monitoring system (MMS) to monitor their health and performance in real time. However, in practice, existing turnout fault diagnosis methods depend largely on human experience. In this paper, we propose a data-driven fault diagnosis method that monitors data from point machines collected using MMS. First, based on a derivative method, data features are extracted by segmenting the original sample. Then, we apply two methods for feature reduction: principal component analysis (PCA) and linear discriminant analysis (LDA). The results show that LDA gave a better performance in the cases studied. A problem that cannot be overlooked is that the imbalanced quantity of rare fault samples and abundant normal samples will reduce the accuracy of classic fault diagnosis models. To deal with this problem of imbalanced data, we propose a modified support vector machine (SVM) method. Finally, an experiment using real data collected from the Guangzhou Railway Line is presented, which demonstrates that our method is reliable and feasible in fault diagnosis. It can further assist engineers to perform timely repairs and maintenance work in the future.


2019 ◽  
Vol 9 (4) ◽  
pp. 746 ◽  
Author(s):  
Sungho Suh ◽  
Haebom Lee ◽  
Jun Jo ◽  
Paul Lukowicz ◽  
Yong Lee

In this study, we developed a novel data-driven fault detection and diagnosis (FDD) method for bearing faults in induction motors where the fault condition data are imbalanced. First, we propose a bearing fault detector based on convolutional neural networks (CNN), in which the vibration signals from a test bench are used as inputs after an image transformation procedure. Experimental results demonstrate that the proposed classifier for FDD performs well (accuracy of 88% to 99%) even when the volume of normal and fault condition data is imbalanced (imbalance ratio varies from 20:1 to 200:1). Additionally, our generative model reduces the level of data imbalance by oversampling. The results improve the accuracy of FDD (by up to 99%) when a severe imbalance ratio (200:1) is assumed.


Sign in / Sign up

Export Citation Format

Share Document