scholarly journals Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Yaoxin Wang ◽  
Yingjie Xu ◽  
Zhenyu Yang ◽  
Xiaoqing Liu ◽  
Qi Dai

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

2020 ◽  
Vol 59 (04/05) ◽  
pp. 151-161
Author(s):  
Yuchen Fei ◽  
Fengyu Zhang ◽  
Chen Zu ◽  
Mei Hong ◽  
Xingchen Peng ◽  
...  

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.


Author(s):  
A. Shamsoddini ◽  
M. R. Aboodi ◽  
J. Karami

Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.


2018 ◽  
Vol 4 (2) ◽  
pp. 1-6
Author(s):  
Ahmed T. Sadiq‎ ◽  
Karrar Shareef Musawi

The Importance of Random Forrest(RF) is one of the most powerful ‎methods ‎of ‎machine learning in ‎Decision Tree.‎ The Proposed hybrid feature selection for Random Forest depend on ‎two ‎measure ‎‎Information Gain and Gini Index in varying percentages ‎based on ‎weight.‎ In this paper, we tend to ‎propose a modify Random Forrest‏ ‏‎algorithm named ‎Random Forest algorithm using hybrid ‎feature ‎‎selection ‎that uses hybrid feature ‎selection instead of ‎using ‎one feature selection. The ‎main plan is to ‎computation the ‎‎ Information ‎Gain for all random selection ‎feature then search for ‎the best split ‎‎point in ‎the node that gives the best ‎value for a hybrid ‎equation with ‎Gini Index. ‎The experimental results on the ‎dataset ‎showed that the proposed ‎modification is ‎better than the classic Random ‎Forest compared to ‎the standard static Random ‎Forest the hybrid feature ‎‎selection Random Forrest shows significant ‎improvement ‎in accuracy measure.‎


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5654
Author(s):  
Guo Li ◽  
Chensheng Wang ◽  
Di Zhang ◽  
Guang Yang

Feature selection and dimensionality reduction are important for the performance of wind turbine condition monitoring models using supervisory control and data acquisition (SCADA) data. In this paper, an improved random forest algorithm, namely Feature Simplification Random Forest (FS_RF), is proposed, which is capable of identifying features closely correlated with wind turbine working conditions. The Euclidian distances are employed to distinguish the weight of the same feature among different samples, and its importance is measured by means of the random forest algorithm. The selected features are finally verified by a two-layer gated recurrent unit (GRU) neural network facilitating condition monitoring. The experimental results demonstrate the capacity and effectiveness of the proposed method for wind turbine condition monitoring.


2016 ◽  
Vol 13 (10) ◽  
pp. 7303-7309 ◽  
Author(s):  
Xiaonan Fang ◽  
Cheng Liang ◽  
Huaxiang Zhang

Ovarian cancer is the most lethal cancer of female reproductive system. Although it only ranks tenth of female malignancy tumors, its death rate is the highest among the female reproductive system tumors. Therefore, there is a pressing and consistent need to better comprehend its pathogenesis. However, the early diagnosis and survival predictions of ovarian cancer patients still remains a challenging problem today. Microarray technology has been widely accepted in early cancer diagnosis and prediction of outcome. Nevertheless, the high-dimension and imbalanced class distribution always disturb the effect of classification. In this paper, we proposed a new imbalanced feature selection method based on Random Forest called IFSRF for ovarian cancer classification. Our method selects AUC as the evaluation criterion when performing feature selection, which can relieve the negative effect of imbalanced classes. We select three manually curated ovarian cancer datasets and five widely used classifiers to show the improvement after using IFSRF. Furthermore, to demonstrate the effectiveness of the proposed method, we compare IFSRF with another widely used feature selection method Relieff. Experiments results on three ovarian cancer diagnosis and survival prediction data sets show that our feature selection method can significantly improve the AUC performance of all classifiers, especially on Random Forest. Meanwhile, the overall prediction accuracy could maintain as well.


Biologia ◽  
2009 ◽  
Vol 64 (4) ◽  
Author(s):  
Samad Jahandideh ◽  
Somayyeh Hoseini ◽  
Mina Jahandideh ◽  
Mohammad Davoodi

AbstractA genetic algorithm (GA) for feature selection in conjunction with neural network was applied to predict protein structural classes based on single amino acid and all dipeptide composition frequencies. These sequence parameters were encoded as input features for a GA in feature selection procedure and classified with a three-layered neural network to predict protein structural classes. The system was established through optimization of the classification performance of neural network which was used as evaluation function. In this study, self-consistency and jackknife tests on a database containing 498 proteins were used to verify the performance of this hybrid method, and were compared with some of prior works. The adoption of a hybrid model, which encompasses genetic and neural technologies, demonstrated to be a promising approach in the task of protein structural class prediction.


Sign in / Sign up

Export Citation Format

Share Document