An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring

Guo Li; Chensheng Wang; Di Zhang; Guang Yang

doi:10.3390/s21165654

An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring

Sensors ◽

10.3390/s21165654 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5654

Author(s):

Guo Li ◽

Chensheng Wang ◽

Di Zhang ◽

Guang Yang

Keyword(s):

Feature Selection ◽

Random Forest ◽

Wind Turbine ◽

Condition Monitoring ◽

Supervisory Control ◽

Feature Selection Method ◽

Selection Method ◽

Random Forest Algorithm ◽

Gated Recurrent Unit ◽

Euclidian Distances

Feature selection and dimensionality reduction are important for the performance of wind turbine condition monitoring models using supervisory control and data acquisition (SCADA) data. In this paper, an improved random forest algorithm, namely Feature Simplification Random Forest (FS_RF), is proposed, which is capable of identifying features closely correlated with wind turbine working conditions. The Euclidian distances are employed to distinguish the weight of the same feature among different samples, and its importance is measured by means of the random forest algorithm. The selected features are finally verified by a two-layer gated recurrent unit (GRU) neural network facilitating condition monitoring. The experimental results demonstrate the capacity and effectiveness of the proposed method for wind turbine condition monitoring.

Download Full-text

Modify Random Forest Algorithm Using Hybrid Feature Selection Method

International Journal on Perceptive and Cognitive Computing ◽

10.31436/ijpcc.v4i2.59 ◽

2018 ◽

Vol 4 (2) ◽

pp. 1-6

Author(s):

Ahmed T. Sadiqâ€Ž ◽

Karrar Shareef Musawi

Keyword(s):

Feature Selection ◽

Random Forest ◽

Gini Index ◽

Feature Selection Method ◽

Selection Method ◽

Random Selection ◽

Experimental Results ◽

Random Forest Algorithm ◽

Selection For

The Importance of Random Forrest(RF) is one of the most powerful â€Žmethods â€Žof â€Žmachine learning in â€ŽDecision Tree.â€Ž The Proposed hybrid feature selection for Random Forest depend on â€Žtwo â€Žmeasure â€Žâ€ŽInformation Gain and Gini Index in varying percentages â€Žbased on â€Žweight.â€Ž In this paper, we tend to â€Žpropose a modify Random Forrestâ€ â€â€Žalgorithm named â€ŽRandom Forest algorithm using hybrid â€Žfeature â€Žâ€Žselection â€Žthat uses hybrid feature â€Žselection instead of â€Žusing â€Žone feature selection. The â€Žmain plan is to â€Žcomputation the â€Žâ€Ž Information â€ŽGain for all random selection â€Žfeature then search for â€Žthe best split â€Žâ€Žpoint in â€Žthe node that gives the best â€Žvalue for a hybrid â€Žequation with â€ŽGini Index. â€ŽThe experimental results on the â€Ždataset â€Žshowed that the proposed â€Žmodification is â€Žbetter than the classic Random â€ŽForest compared to â€Žthe standard static Random â€ŽForest the hybrid feature â€Žâ€Žselection Random Forrest shows significant â€Žimprovement â€Žin accuracy measure.â€Ž

Download Full-text

Model for predicting heart failure using Random Forest and Logistic Regression algorithms

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1208/1/012039 ◽

2021 ◽

Vol 1208 (1) ◽

pp. 012039

Author(s):

Vedran Grgić ◽

Denis Mušić ◽

Elmir Babović

Keyword(s):

Heart Failure ◽

Cardiovascular Disease ◽

Logistic Regression ◽

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Selection Method ◽

Random Forest Algorithm ◽

Regression Algorithms ◽

Significant Factors

Abstract The paper analyzes the cardiovascular parameters of patients with heart disease. The aim of this study was to predict death in a patient with cardiovascular disease based on 12 parameters, using Random Forest and Logistic Regression algorithms. Parameters were tuned for both algorithms to determine the best settings. The most significant factors in the process predicted were found using the FEATURE SELECTION method of both algorithms. By comparative analysis of the obtained results, the highest accuracy of 90% was obtained using the Random Forest Algorithm.

Download Full-text

TMS Feature Selection Method for Truncate Based Random Forest Model

Computer Science and Application ◽

10.12677/csa.2020.102029 ◽

2020 ◽

Vol 10 (02) ◽

pp. 276-288

Author(s):

松王

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Selection Method ◽

Random Forest Model ◽

Forest Model

Download Full-text

A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest

Energies ◽

10.3390/en9100767 ◽

2016 ◽

Vol 9 (10) ◽

pp. 767 ◽

Cited By ~ 29

Author(s):

Nantian Huang ◽

Guobo Lu ◽

Dianguo Xu

Keyword(s):

Feature Selection ◽

Random Forest ◽

Load Forecasting ◽

Feature Selection Method ◽

Selection Method ◽

Short Term ◽

Electricity Load ◽

Electricity Load Forecasting

Download Full-text

Efficient feature selection method using contribution ratio by random forest

2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV) ◽

10.1109/fcv.2015.7103746 ◽

2015 ◽

Cited By ~ 2

Author(s):

Ryuei Murata ◽

Yohei Mishina ◽

Yuji Yamauchi ◽

Takayoshi Yamashita ◽

Hironobu Fujiyoshi

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Selection Method ◽

Contribution Ratio

Download Full-text

MRF-RFS: A Modified Random Forest Recursive Feature Selection Algorithm for Nasopharyngeal Carcinoma Segmentation

Methods of Information in Medicine ◽

10.1055/s-0040-1721791 ◽

2020 ◽

Vol 59 (04/05) ◽

pp. 151-161

Author(s):

Yuchen Fei ◽

Fengyu Zhang ◽

Chen Zu ◽

Mei Hong ◽

Xingchen Peng ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Nasopharyngeal Carcinoma ◽

Soft Tissues ◽

Feature Selection Method ◽

Selection Method ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Tumor Margins

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.

Download Full-text

TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w4-483-2017 ◽

2017 ◽

Vol XLII-4/W4 ◽

pp. 483-488 ◽

Cited By ~ 2

Author(s):

A. Shamsoddini ◽

M. R. Aboodi ◽

J. Karami

Keyword(s):

Neural Networks ◽

Air Pollution ◽

Feature Selection ◽

Artificial Neural Networks ◽

Random Forest ◽

Sulfur Dioxide ◽

Nitrogen Dioxide ◽

Feature Selection Method ◽

Selection Method ◽

The Other

Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

Download Full-text

Imbalanced Feature Selection with Random Forest for Ovarian Cancer Diagnosis and Survival Prediction

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5714 ◽

2016 ◽

Vol 13 (10) ◽

pp. 7303-7309 ◽

Cited By ~ 1

Author(s):

Xiaonan Fang ◽

Cheng Liang ◽

Huaxiang Zhang

Keyword(s):

Ovarian Cancer ◽

Feature Selection ◽

Random Forest ◽

Cancer Diagnosis ◽

Reproductive System ◽

Feature Selection Method ◽

Selection Method ◽

Survival Prediction ◽

Female Reproductive System ◽

Early Cancer Diagnosis

Ovarian cancer is the most lethal cancer of female reproductive system. Although it only ranks tenth of female malignancy tumors, its death rate is the highest among the female reproductive system tumors. Therefore, there is a pressing and consistent need to better comprehend its pathogenesis. However, the early diagnosis and survival predictions of ovarian cancer patients still remains a challenging problem today. Microarray technology has been widely accepted in early cancer diagnosis and prediction of outcome. Nevertheless, the high-dimension and imbalanced class distribution always disturb the effect of classification. In this paper, we proposed a new imbalanced feature selection method based on Random Forest called IFSRF for ovarian cancer classification. Our method selects AUC as the evaluation criterion when performing feature selection, which can relieve the negative effect of imbalanced classes. We select three manually curated ovarian cancer datasets and five widely used classifiers to show the improvement after using IFSRF. Furthermore, to demonstrate the effectiveness of the proposed method, we compare IFSRF with another widely used feature selection method Relieff. Experiments results on three ovarian cancer diagnosis and survival prediction data sets show that our feature selection method can significantly improve the AUC performance of all classifiers, especially on Random Forest. Meanwhile, the overall prediction accuracy could maintain as well.

Download Full-text

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/5529389 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yaoxin Wang ◽

Yingjie Xu ◽

Zhenyu Yang ◽

Xiaoqing Liu ◽

Qi Dai

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Structural Features ◽

Selection Method ◽

Prediction Methods ◽

Information Redundancy ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

Download Full-text

Feature selection to increase the random forest method performance on high dimensional data

International Journal of Advances in Intelligent Informatics ◽

10.26555/ijain.v6i3.471 ◽

2020 ◽

Vol 6 (3) ◽

pp. 303

Author(s):

Maria Irmina Prasetiyowati ◽

Nur Ulfa Maulidevi ◽

Kridanto Surendro

Keyword(s):

Feature Selection ◽

Random Forest ◽

Land Cover ◽

High Dimensional Data ◽

Feature Selection Method ◽

Urban Land ◽

Selection Method ◽

High Dimensional ◽

Urban Land Cover ◽

Random Forest Method

Random Forest is a supervised classification method based on bagging (Bootstrap aggregating) Breiman and random selection of features. The choice of features randomly assigned to the Random Forest makes it possible that the selected feature is not necessarily informative. So it is necessary to select features in the Random Forest. The purpose of choosing this feature is to select an optimal subset of features that contain valuable information in the hope of accelerating the performance of the Random Forest method. Mainly for the execution of high-dimensional datasets such as the Parkinson, CNAE-9, and Urban Land Cover dataset. The feature selection is done using the Correlation-Based Feature Selection method, using the BestFirst method. Tests were carried out 30 times using the K-Cross Fold Validation value of 10 and dividing the dataset into 70% training and 30% testing. The experiments using the Parkinson dataset obtained a time difference of 0.27 and 0.28 seconds faster than using the Random Forest method without feature selection. Likewise, the trials in the Urban Land Cover dataset had 0.04 and 0.03 seconds, while for the CNAE-9 dataset, the difference time was 2.23 and 2.81 faster than using the Random Forest method without feature selection. These experiments showed that the Random Forest processes are faster when using the first feature selection. Likewise, the accuracy value increased in the two previous experiments, while only the CNAE-9 dataset experiment gets a lower accuracy. This research’s benefits is by first performing feature selection steps using the Correlation-Base Feature Selection method can increase the speed of performance and accuracy of the Random Forest method on high-dimensional data.

Download Full-text