Fault Diagnosis for Wind Turbines Based on ReliefF and eXtreme Gradient Boosting

Zidong Wu; Xiaoli Wang; Baochen Jiang

doi:10.3390/app10093258

Fault Diagnosis for Wind Turbines Based on ReliefF and eXtreme Gradient Boosting

Applied Sciences ◽

10.3390/app10093258 ◽

2020 ◽

Vol 10 (9) ◽

pp. 3258 ◽

Cited By ~ 1

Author(s):

Zidong Wu ◽

Xiaoli Wang ◽

Baochen Jiang

Keyword(s):

Fault Diagnosis ◽

Wind Turbine ◽

Wind Turbines ◽

Fault Classification ◽

Gradient Boosting ◽

Support Vector ◽

Adaptive Boosting ◽

Scada System ◽

Extreme Gradient Boosting ◽

Multi Classification

In order to improve the accuracy of fault diagnosis on wind turbines, this paper presents a method of wind turbine fault diagnosis based on ReliefF algorithm and eXtreme Gradient Boosting (XGBoost) algorithm by using the data in supervisory control and data acquisition (SCADA) system. The algorithm consists of the following two parts: The first part is the ReliefF multi-classification feature selection algorithm. According to the SCADA history data and the wind turbines fault record, the ReliefF algorithm is used to select feature parameters that are highly correlated with common faults. The second part is the XGBoost fault recognition algorithm. First of all, we use the historical data records as the input, and use the ReliefF algorithm to select the SCADA system observation features with high correlation with the fault classification, then use these feature data to build the XGBoost multi classification fault identification model, and finally we input the monitoring data generated by the actual running wind turbine into the XGBoost model to get the operation status of the wind turbine. We compared the algorithm proposed in this paper with other algorithms, such as radial basis function-Support Vector Machine (rbf-SVM) and Adaptive Boosting (AdaBoost) classification algorithms, and the results showed that the classification accuracy using “ReliefF + XGBoost” algorithm was higher than other algorithms.

Download Full-text

Low-Pass Filtering Empirical Wavelet Transform Machine Learning Based Fault Diagnosis for Combined Fault of Wind Turbines

Entropy ◽

10.3390/e23080975 ◽

2021 ◽

Vol 23 (8) ◽

pp. 975

Author(s):

Yancai Xiao ◽

Jinyu Xue ◽

Mengdi Li ◽

Wei Yang

Keyword(s):

Machine Learning ◽

Wavelet Transform ◽

Fault Diagnosis ◽

Wind Turbine ◽

Wind Turbines ◽

Gear Tooth ◽

Support Vector ◽

Grey Wolf Optimizer ◽

Empirical Wavelet Transform ◽

Low Pass

Fault diagnosis of wind turbines is of great importance to reduce operating and maintenance costs of wind farms. At present, most wind turbine fault diagnosis methods are focused on single faults, and the methods for combined faults usually depend on inefficient manual analysis. Filling the gap, this paper proposes a low-pass filtering empirical wavelet transform (LPFEWT) machine learning based fault diagnosis method for combined fault of wind turbines, which can identify the fault type of wind turbines simply and efficiently without human experience and with low computation costs. In this method, low-pass filtering empirical wavelet transform is proposed to extract fault features from vibration signals, LPFEWT energies are selected to be the inputs of the fault diagnosis model, a grey wolf optimizer hyperparameter tuned support vector machine (SVM) is employed for fault diagnosis. The method is verified on a wind turbine test rig that can simulate shaft misalignment and broken gear tooth faulty conditions. Compared with other models, the proposed model has superiority for this classification problem.

Download Full-text

Development of an SVR Model for the Fault Diagnosis of Large-Scale Doubly-Fed Wind Turbines Using SCADA Data

Energies ◽

10.3390/en12173396 ◽

2019 ◽

Vol 12 (17) ◽

pp. 3396 ◽

Cited By ~ 2

Author(s):

Mingzhu Tang ◽

Wei Chen ◽

Qi Zhao ◽

Huawei Wu ◽

Wen Long ◽

...

Keyword(s):

Fault Diagnosis ◽

Wind Turbine ◽

Support Vector Regression ◽

Confidence Intervals ◽

Wind Turbines ◽

Large Scale ◽

Support Vector ◽

Effective Response ◽

Support Vector Regression Model ◽

Doubly Fed

Fault diagnosis and forecasting contribute significantly to the reduction of operating and maintenance associated costs, as well as to improve the resilience of wind turbine systems. Different from the existing fault diagnosis approaches using monitored vibration and acoustic data from the auxiliary equipment, this research presents a novel fault diagnosis and forecasting approach underpinned by a support vector regression model using data obtained by the supervisory control and data acquisition system (SCADA) of wind turbines (WT). To operate, the extraction of fault diagnosis features is conducted by measuring SCADA parameters. After that, confidence intervals are set up to guide the fault diagnosis implemented by the support vector regression (SVR) model. With the employment of confidence intervals as the performance indicators, an SVR-based fault detecting approach is then developed. Based on the WT SCADA data and the SVR model, a fault diagnosis strategy for large-scale doubly-fed wind turbine systems is investigated. A case study including a one-year monitoring SCADA data collected from a wind farm in Southern China is employed to validate the proposed methodology and demonstrate how it works. Results indicate that the proposed strategy can support the troubleshooting of wind turbine systems with high precision and effective response.

Download Full-text

Modeling hydrogen solubility in hydrocarbons using extreme gradient boosting and equations of state

Scientific Reports ◽

10.1038/s41598-021-97131-8 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Mohammad-Reza Mohammadi ◽

Fahime Hadavimoghaddam ◽

Maryam Pourmahdi ◽

Saeid Atashrouz ◽

Muhammad Tajammal Munir ◽

...

Keyword(s):

Molecular Weight ◽

Equations Of State ◽

Optimal Operation ◽

Operating Conditions ◽

Hydrogen Solubility ◽

Gradient Boosting ◽

Accurate Estimation ◽

Support Vector ◽

Adaptive Boosting ◽

Extreme Gradient Boosting

AbstractDue to industrial development, designing and optimal operation of processes in chemical and petroleum processing plants require accurate estimation of the hydrogen solubility in various hydrocarbons. Equations of state (EOSs) are limited in accurately predicting hydrogen solubility, especially at high-pressure or/and high-temperature conditions, which may lead to energy waste and a potential safety hazard in plants. In this paper, five robust machine learning models including extreme gradient boosting (XGBoost), adaptive boosting support vector regression (AdaBoost-SVR), gradient boosting with categorical features support (CatBoost), light gradient boosting machine (LightGBM), and multi-layer perceptron (MLP) optimized by Levenberg–Marquardt (LM) algorithm were implemented for estimating the hydrogen solubility in hydrocarbons. To this end, a databank including 919 experimental data points of hydrogen solubility in 26 various hydrocarbons was gathered from 48 different systems in a broad range of operating temperatures (213–623 K) and pressures (0.1–25.5 MPa). The hydrocarbons are from six different families including alkane, alkene, cycloalkane, aromatic, polycyclic aromatic, and terpene. The carbon number of hydrocarbons is ranging from 4 to 46 corresponding to a molecular weight range of 58.12–647.2 g/mol. Molecular weight, critical pressure, and critical temperature of solvents along with pressure and temperature operating conditions were selected as input parameters to the models. The XGBoost model best fits all the experimental solubility data with a root mean square error (RMSE) of 0.0007 and an average absolute percent relative error (AAPRE) of 1.81%. Also, the proposed models for estimating the solubility of hydrogen in hydrocarbons were compared with five EOSs including Soave–Redlich–Kwong (SRK), Peng–Robinson (PR), Redlich–Kwong (RK), Zudkevitch–Joffe (ZJ), and perturbed-chain statistical associating fluid theory (PC-SAFT). The XGBoost model introduced in this study is a promising model that can be applied as an efficient estimator for hydrogen solubility in various hydrocarbons and is capable of being utilized in the chemical and petroleum industries.

Download Full-text

A Study on Machine Vision Techniques for the Inspection of Health Personnels’ Protective Suits for the Treatment of Patients in Extreme Isolation

Electronics ◽

10.3390/electronics8070743 ◽

2019 ◽

Vol 8 (7) ◽

pp. 743 ◽

Cited By ~ 1

Author(s):

Alice Stazio ◽

Juan G. Victores ◽

David Estevez ◽

Carlos Balaguer

Keyword(s):

Logistic Regression ◽

Machine Vision ◽

Training Data ◽

Gradient Boosting ◽

Support Vector ◽

Classification Algorithms ◽

Adaptive Boosting ◽

Blood Stains ◽

Extreme Gradient Boosting ◽

Vector Machines

The examination of Personal Protective Equipment (PPE) to assure the complete integrity of health personnel in contact with infected patients is one of the most necessary tasks when treating patients affected by infectious diseases, such as Ebola. This work focuses on the study of machine vision techniques for the detection of possible defects on the PPE that could arise after contact with the aforementioned pathological patients. A preliminary study on the use of image classification algorithms to identify blood stains on PPE subsequent to the treatment of the infected patient is presented. To produce training data for these algorithms, a synthetic dataset was generated from a simulated model of a PPE suit with blood stains. Furthermore, the study proceeded with the utilization of images of the PPE with a physical emulation of blood stains, taken by a real prototype. The dataset reveals a great imbalance between positive and negative samples; therefore, all the selected classification algorithms are able to manage this kind of data. Classifiers range from Logistic Regression and Support Vector Machines, to bagging and boosting techniques such as Random Forest, Adaptive Boosting, Gradient Boosting and eXtreme Gradient Boosting. All these algorithms were evaluated on accuracy, precision, recall and F 1 score; and additionally, execution times were considered. The obtained results report promising outcomes of all the classifiers, and, in particular Logistic Regression resulted to be the most suitable classification algorithm in terms of F 1 score and execution time, considering both datasets.

Download Full-text

Application of machine learning algorithms for flood susceptibility assessment and risk management

Journal of Water and Climate Change ◽

10.2166/wcc.2021.051 ◽

2021 ◽

Author(s):

R. Madhuri ◽

S. Sistla ◽

K. Srinivasa Raju

Keyword(s):

Climate Change ◽

Machine Learning ◽

Influencing Factors ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Climate Change Scenarios ◽

Adaptive Boosting ◽

Extreme Gradient Boosting ◽

Normalised Difference Vegetation Index

Abstract Assessing floods and their likely impact in climate change scenarios will enable the facilitation of sustainable management strategies. In this study, five machine learning (ML) algorithms, namely (i) Logistic Regression, (ii) Support Vector Machine, (iii) K-nearest neighbor, (iv) Adaptive Boosting (AdaBoost) and (v) Extreme Gradient Boosting (XGBoost), were tested for Greater Hyderabad Municipal Corporation (GHMC), India, to evaluate their clustering abilities to classify locations (flooded or non-flooded) for climate change scenarios. A geo-spatial database, with eight flood influencing factors, namely, rainfall, elevation, slope, distance from nearest stream, evapotranspiration, land surface temperature, normalised difference vegetation index and curve number, was developed for 2000, 2006 and 2016. XGBoost performed the best, with the highest mean area under curve score of 0.83. Hence, XGBoost was adopted to simulate the future flood locations corresponding to probable highest rainfall events under four Representative Concentration Pathways (RCPs), namely, 2.6, 4.5, 6.0 and 8.5 along with other flood influencing factors for 2040, 2056, 2050 and 2064, respectively. The resulting ranges of flood risk probabilities are predicted as 39–77%, 16–39%, 42–63% and 39–77% for the respective years.

Download Full-text

Multi-Fault Classifier Based on Support Vector Machine and Its Application

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.293-294.483 ◽

2005 ◽

Vol 293-294 ◽

pp. 483-492 ◽

Cited By ~ 4

Author(s):

Zhou Suo Zhang ◽

Minghui Shen ◽

Wenzhi Lv ◽

Zheng Jia He

Keyword(s):

Support Vector Machine ◽

Fault Diagnosis ◽

Steam Turbine ◽

Time Domain ◽

Simple Algorithm ◽

Fault Classification ◽

Support Vector ◽

Turbine Generator ◽

Signal Features ◽

Multi Classification

Aiming at problem on limiting development of machinery fault intelligent diagnosis due to needing many fault data samples, this paper improves a multi-classification algorithm of support vector machine, and a multi-fault classifier based on the algorithm is constructed. Training the multi-fault classifier only needs a small quantity of fault data samples in time domain, and does not need signal preprocessing of extracting signal features. The multi-fault classifier has been applied to fault diagnosis of steam turbine generator, and the results show that it has such simple algorithm, online fault classification and excellent capability of fault classification as advantages.

Download Full-text

Meta-XGBoost for Hyperspectral Image Classification Using Extended MSER-Guided Morphological Profiles

Remote Sensing ◽

10.3390/rs12121973 ◽

2020 ◽

Vol 12 (12) ◽

pp. 1973

Author(s):

Alim Samat ◽

Erzhu Li ◽

Wei Wang ◽

Sicong Liu ◽

Cong Lin ◽

...

Keyword(s):

Random Forest ◽

Image Classification ◽

Classification Accuracy ◽

Regression Tree ◽

Superior Performance ◽

Gradient Boosting ◽

Support Vector ◽

Adaptive Boosting ◽

Extreme Gradient Boosting

To investigate the performance of extreme gradient boosting (XGBoost) in remote sensing image classification tasks, XGBoost was first introduced and comparatively investigated for the spectral-spatial classification of hyperspectral imagery using the extended maximally stable extreme-region-guided morphological profiles (EMSER_MPs) proposed in this study. To overcome the potential issues of XGBoost, meta-XGBoost was proposed as an ensemble XGBoost method with classification and regression tree (CART), dropout-introduced multiple additive regression tree (DART), elastic net regression and parallel coordinate descent-based linear regression (linear) and random forest (RaF) boosters. Moreover, to evaluate the performance of the introduced XGBoost approach with different boosters, meta-XGBoost and EMSER_MPs, well-known and widely accepted classifiers, including support vector machine (SVM), bagging, adaptive boosting (AdaBoost), multi class AdaBoost (MultiBoost), extremely randomized decision trees (ExtraTrees), RaF, classification via random forest regression (CVRFR) and ensemble of nested dichotomies with extremely randomized decision tree (END-ERDT) methods, were considered in terms of the classification accuracy and computational efficiency. The experimental results based on two benchmark hyperspectral data sets confirm the superior performance of EMSER_MPs and EMSER_MPs with mean pixel values within region (EMSER_MPsM) compared to that for morphological profiles (MPs), morphological profile with partial reconstruction (MPPR), extended MPs (EMPs), extended MPPR (EMPPR), maximally stable extreme-region-guided morphological profiles (MSER_MPs) and MSER_MPs with mean pixel values within region (MSER_MPsM) features. The proposed meta-XGBoost algorithm is capable of obtaining better results than XGBoost with the CART, DART, linear and RaF boosters, and it could be an alternative to the other considered classifiers in terms of the classification of hyperspectral images using advanced spectral-spatial features, especially from generalized classification accuracy and model training efficiency perspectives.

Download Full-text

SCADA Data Analysis Methods for Diagnosis of Electrical Faults to Wind Turbine Generators

Applied Sciences ◽

10.3390/app11083307 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3307

Author(s):

Francesco Castellani ◽

Davide Astolfi ◽

Francesco Natili

Keyword(s):

Fault Diagnosis ◽

Wind Turbine ◽

Wind Turbines ◽

Principal Component ◽

Support Vector ◽

General Context ◽

Test Case ◽

Electric Generator ◽

Before And After ◽

Electrical Faults

The electric generator is estimated to be among the top three contributors to the failure rates and downtime of wind turbines. For this reason, in the general context of increasing interest towards effective wind turbine condition monitoring techniques, fault diagnosis of electric generators is particularly important. The objective of this study is contributing to the techniques for wind turbine generator fault diagnosis through a supervisory control and data acquisition (SCADA) analysis method. The work is organized as a real-world test-case discussion, involving electric damage to the generator of a Vestas V52 wind turbine sited in southern Italy. SCADA data before and after the generator damage have been analyzed for the target wind turbine and for reference healthy wind turbines from the same site. By doing this, it has been possible to formulate a normal behavior model, based on principal component analysis and support vector regression, for the power and for the voltages and currents of the wind turbine. It is shown that the incipience of the fault can be individuated as a change in the behavior of the residuals between model estimates and measurements. This phenomenon was clearly visible approximately two weeks before the fault. Considering the fast evolution of electrical damage, this result is promising as regards the perspectives of exploiting SCADA data for individuating electric damage with an advance that can be useful for applications in wind energy practice.

Download Full-text

Question to Question Similarity Analysis Using Morphological, Syntactic, Semantic, and Lexical Features

JUCS - Journal of Universal Computer Science ◽

10.3897/jucs.2020.036 ◽

2020 ◽

Vol 26 (6) ◽

pp. 671-697

Author(s):

Mahmoud Hammad ◽

Mohammad Al-Smadi ◽

Qanita Baker ◽

Muntaha D ◽

Nour Al-Khdour ◽

...

Keyword(s):

Language Processing ◽

Text Processing ◽

Gradient Boosting ◽

Support Vector ◽

Arabic Text ◽

Feature Selection Technique ◽

Adaptive Boosting ◽

Machine Learning Classifiers ◽

Extreme Gradient Boosting ◽

Promising Solution

In the digitally connected world that we are living in, people expect to get answers to their questions spontaneously. This expectation increased the burden on Question/Answer platforms such as Stack Overflow and many others. A promising solution to this problem is to detect if a question being asked is similar to a question in the database, then present the answer of the detected question to the user. To address this challenge, we propose a novel Natural Language Processing (NLP) approach that detects if two Arabic questions are similar or not using their extracted morphological, syntactic, semantic, lexical, overlapping, and semantic lexical features. Our approach involves several phases including Arabic text processing, novel feature extraction, and text classifications. Moreover, we conducted a comparison between seven different machine learning classifiers. The included classifiers are: Support Vector Machine (SVM), Decision Tree (DT), Logistic Regression (LR), Extreme Gradient Boosting (XGB), Random Forests (RF), Adaptive Boosting (AdaBoost), and Multilayer Perceptron (MLP). To conduct our experiments, we used a real-world questions dataset consisting of around 19,136 questions (9,568 pairs of questions) in which our approach achieved 82.93% accuracy using our XGB model on the best features selected by the Random Forest feature selection technique. This high accuracy of our model shows the ability of our approach to correctly detect similar Arabic questions and hence increases user satisfactions.

Download Full-text

Development and performance assessment of novel machine learning models to predict pneumonia after liver transplantation

Respiratory Research ◽

10.1186/s12931-021-01690-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Chaojin Chen ◽

Dong Yang ◽

Shilong Gao ◽

Yihan Zhang ◽

Liubing Chen ◽

...

Keyword(s):

Machine Learning ◽

Liver Transplantation ◽

Postoperative Pneumonia ◽

Gradient Boosting ◽

Support Vector ◽

High Morbidity ◽

Training Set ◽

Adaptive Boosting ◽

Extreme Gradient Boosting ◽

Testing Set

Abstract Background Pneumonia is the most frequently encountered postoperative pulmonary complications (PPC) after orthotopic liver transplantation (OLT), which cause high morbidity and mortality rates. We aimed to develop a model to predict postoperative pneumonia in OLT patients using machine learning (ML) methods. Methods Data of 786 adult patients underwent OLT at the Third Affiliated Hospital of Sun Yat-sen University from January 2015 to September 2019 was retrospectively extracted from electronic medical records and randomly subdivided into a training set and a testing set. With the training set, six ML models including logistic regression (LR), support vector machine (SVM), random forest (RF), adaptive boosting (AdaBoost), extreme gradient boosting (XGBoost) and gradient boosting machine (GBM) were developed. These models were assessed by the area under curve (AUC) of receiver operating characteristic on the testing set. The related risk factors and outcomes of pneumonia were also probed based on the chosen model. Results 591 OLT patients were eventually included and 253 (42.81%) were diagnosed with postoperative pneumonia, which was associated with increased postoperative hospitalization and mortality (P < 0.05). Among the six ML models, XGBoost model performed best. The AUC of XGBoost model on the testing set was 0.734 (sensitivity: 52.6%; specificity: 77.5%). Pneumonia was notably associated with 14 items features: INR, HCT, PLT, ALB, ALT, FIB, WBC, PT, serum Na+, TBIL, anesthesia time, preoperative length of stay, total fluid transfusion and operation time. Conclusion Our study firstly demonstrated that the XGBoost model with 14 common variables might predict postoperative pneumonia in OLT patients.

Download Full-text