scholarly journals An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring

Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5654
Author(s):  
Guo Li ◽  
Chensheng Wang ◽  
Di Zhang ◽  
Guang Yang

Feature selection and dimensionality reduction are important for the performance of wind turbine condition monitoring models using supervisory control and data acquisition (SCADA) data. In this paper, an improved random forest algorithm, namely Feature Simplification Random Forest (FS_RF), is proposed, which is capable of identifying features closely correlated with wind turbine working conditions. The Euclidian distances are employed to distinguish the weight of the same feature among different samples, and its importance is measured by means of the random forest algorithm. The selected features are finally verified by a two-layer gated recurrent unit (GRU) neural network facilitating condition monitoring. The experimental results demonstrate the capacity and effectiveness of the proposed method for wind turbine condition monitoring.

2018 ◽  
Vol 4 (2) ◽  
pp. 1-6
Author(s):  
Ahmed T. Sadiq‎ ◽  
Karrar Shareef Musawi

The Importance of Random Forrest(RF) is one of the most powerful ‎methods ‎of ‎machine learning in ‎Decision Tree.‎ The Proposed hybrid feature selection for Random Forest depend on ‎two ‎measure ‎‎Information Gain and Gini Index in varying percentages ‎based on ‎weight.‎ In this paper, we tend to ‎propose a modify Random Forrest‏ ‏‎algorithm named ‎Random Forest algorithm using hybrid ‎feature ‎‎selection ‎that uses hybrid feature ‎selection instead of ‎using ‎one feature selection. The ‎main plan is to ‎computation the ‎‎ Information ‎Gain for all random selection ‎feature then search for ‎the best split ‎‎point in ‎the node that gives the best ‎value for a hybrid ‎equation with ‎Gini Index. ‎The experimental results on the ‎dataset ‎showed that the proposed ‎modification is ‎better than the classic Random ‎Forest compared to ‎the standard static Random ‎Forest the hybrid feature ‎‎selection Random Forrest shows significant ‎improvement ‎in accuracy measure.‎


2021 ◽  
Vol 1208 (1) ◽  
pp. 012039
Author(s):  
Vedran Grgić ◽  
Denis Mušić ◽  
Elmir Babović

Abstract The paper analyzes the cardiovascular parameters of patients with heart disease. The aim of this study was to predict death in a patient with cardiovascular disease based on 12 parameters, using Random Forest and Logistic Regression algorithms. Parameters were tuned for both algorithms to determine the best settings. The most significant factors in the process predicted were found using the FEATURE SELECTION method of both algorithms. By comparative analysis of the obtained results, the highest accuracy of 90% was obtained using the Random Forest Algorithm.


2020 ◽  
Vol 59 (04/05) ◽  
pp. 151-161
Author(s):  
Yuchen Fei ◽  
Fengyu Zhang ◽  
Chen Zu ◽  
Mei Hong ◽  
Xingchen Peng ◽  
...  

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.


Author(s):  
A. Shamsoddini ◽  
M. R. Aboodi ◽  
J. Karami

Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.


2016 ◽  
Vol 13 (10) ◽  
pp. 7303-7309 ◽  
Author(s):  
Xiaonan Fang ◽  
Cheng Liang ◽  
Huaxiang Zhang

Ovarian cancer is the most lethal cancer of female reproductive system. Although it only ranks tenth of female malignancy tumors, its death rate is the highest among the female reproductive system tumors. Therefore, there is a pressing and consistent need to better comprehend its pathogenesis. However, the early diagnosis and survival predictions of ovarian cancer patients still remains a challenging problem today. Microarray technology has been widely accepted in early cancer diagnosis and prediction of outcome. Nevertheless, the high-dimension and imbalanced class distribution always disturb the effect of classification. In this paper, we proposed a new imbalanced feature selection method based on Random Forest called IFSRF for ovarian cancer classification. Our method selects AUC as the evaluation criterion when performing feature selection, which can relieve the negative effect of imbalanced classes. We select three manually curated ovarian cancer datasets and five widely used classifiers to show the improvement after using IFSRF. Furthermore, to demonstrate the effectiveness of the proposed method, we compare IFSRF with another widely used feature selection method Relieff. Experiments results on three ovarian cancer diagnosis and survival prediction data sets show that our feature selection method can significantly improve the AUC performance of all classifiers, especially on Random Forest. Meanwhile, the overall prediction accuracy could maintain as well.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yaoxin Wang ◽  
Yingjie Xu ◽  
Zhenyu Yang ◽  
Xiaoqing Liu ◽  
Qi Dai

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.


Author(s):  
Maria Irmina Prasetiyowati ◽  
Nur Ulfa Maulidevi ◽  
Kridanto Surendro

Random Forest is a supervised classification method based on bagging (Bootstrap aggregating) Breiman and random selection of features. The choice of features randomly assigned to the Random Forest makes it possible that the selected feature is not necessarily informative. So it is necessary to select features in the Random Forest. The purpose of choosing this feature is to select an optimal subset of features that contain valuable information in the hope of accelerating the performance of the Random Forest method. Mainly for the execution of high-dimensional datasets such as the Parkinson, CNAE-9, and Urban Land Cover dataset. The feature selection is done using the Correlation-Based Feature Selection method, using the BestFirst method. Tests were carried out 30 times using the K-Cross Fold Validation value of 10 and dividing the dataset into 70% training and 30% testing. The experiments using the Parkinson dataset obtained a time difference of 0.27 and 0.28 seconds faster than using the Random Forest method without feature selection. Likewise, the trials in the Urban Land Cover dataset had 0.04 and 0.03 seconds, while for the CNAE-9 dataset, the difference time was 2.23 and 2.81 faster than using the Random Forest method without feature selection. These experiments showed that the Random Forest processes are faster when using the first feature selection. Likewise, the accuracy value increased in the two previous experiments, while only the CNAE-9 dataset experiment gets a lower accuracy. This research’s benefits is by first performing feature selection steps using the Correlation-Base Feature Selection method can increase the speed of performance and accuracy of the Random Forest method on high-dimensional data.


Sign in / Sign up

Export Citation Format

Share Document