Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences

Computational and Mathematical Methods in Medicine ◽

10.1155/2021/5529389 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Yaoxin Wang ◽

Yingjie Xu ◽

Zhenyu Yang ◽

Xiaoqing Liu ◽

Qi Dai

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Structural Features ◽

Selection Method ◽

Prediction Methods ◽

Information Redundancy ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class

Many combinations of protein features are used to improve protein structural class prediction, but the information redundancy is often ignored. In order to select the important features with strong classification ability, we proposed a recursive feature selection with random forest to improve protein structural class prediction. We evaluated the proposed method with four experiments and compared it with the available competing prediction methods. The results indicate that the proposed feature selection method effectively improves the efficiency of protein structural class prediction. Only less than 5% features are used, but the prediction accuracy is improved by 4.6-13.3%. We further compared different protein features and found that the predicted secondary structural features achieve the best performance. This understanding can be used to design more powerful prediction methods for the protein structural class.

Download Full-text

A novel feature selection method to predict protein structural class

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2018.06.007 ◽

2018 ◽

Vol 76 ◽

pp. 118-129 ◽

Cited By ~ 3

Author(s):

Mingshun Yuan ◽

Zijiang Yang ◽

Guangzao Huang ◽

Guoli Ji

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Structural Class ◽

Protein Structural Class

Download Full-text

TMS Feature Selection Method for Truncate Based Random Forest Model

Computer Science and Application ◽

10.12677/csa.2020.102029 ◽

2020 ◽

Vol 10 (02) ◽

pp. 276-288

Author(s):

松王

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Selection Method ◽

Random Forest Model ◽

Forest Model

Download Full-text

A Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest

Energies ◽

10.3390/en9100767 ◽

2016 ◽

Vol 9 (10) ◽

pp. 767 ◽

Cited By ~ 29

Author(s):

Nantian Huang ◽

Guobo Lu ◽

Dianguo Xu

Keyword(s):

Feature Selection ◽

Random Forest ◽

Load Forecasting ◽

Feature Selection Method ◽

Selection Method ◽

Short Term ◽

Electricity Load ◽

Electricity Load Forecasting

Download Full-text

Efficient feature selection method using contribution ratio by random forest

2015 21st Korea-Japan Joint Workshop on Frontiers of Computer Vision (FCV) ◽

10.1109/fcv.2015.7103746 ◽

2015 ◽

Cited By ~ 2

Author(s):

Ryuei Murata ◽

Yohei Mishina ◽

Yuji Yamauchi ◽

Takayoshi Yamashita ◽

Hironobu Fujiyoshi

Keyword(s):

Feature Selection ◽

Random Forest ◽

Feature Selection Method ◽

Selection Method ◽

Contribution Ratio

Download Full-text

MRF-RFS: A Modified Random Forest Recursive Feature Selection Algorithm for Nasopharyngeal Carcinoma Segmentation

Methods of Information in Medicine ◽

10.1055/s-0040-1721791 ◽

2020 ◽

Vol 59 (04/05) ◽

pp. 151-161

Author(s):

Yuchen Fei ◽

Fengyu Zhang ◽

Chen Zu ◽

Mei Hong ◽

Xingchen Peng ◽

...

Keyword(s):

Feature Selection ◽

Random Forest ◽

Nasopharyngeal Carcinoma ◽

Soft Tissues ◽

Feature Selection Method ◽

Selection Method ◽

Feature Subset ◽

Selection Algorithm ◽

Feature Selection Algorithm ◽

Tumor Margins

Abstract Background An accurate and reproducible method to delineate tumor margins is of great importance in clinical diagnosis and treatment. In nasopharyngeal carcinoma (NPC), due to limitations such as high variability, low contrast, and discontinuous boundaries in presenting soft tissues, tumor margin can be extremely difficult to identify in magnetic resonance imaging (MRI), increasing the challenge of NPC segmentation task. Objectives The purpose of this work is to develop a semiautomatic algorithm for NPC image segmentation with minimal human intervention, while it is also capable of delineating tumor margins with high accuracy and reproducibility. Methods In this paper, we propose a novel feature selection algorithm for the identification of the margin of NPC image, named as modified random forest recursive feature selection (MRF-RFS). Specifically, to obtain a more discriminative feature subset for segmentation, a modified recursive feature selection method is applied to the original handcrafted feature set. Moreover, we combine the proposed feature selection method with the classical random forest (RF) in the training stage to take full advantage of its intrinsic property (i.e., feature importance measure). Results To evaluate the segmentation performance, we verify our method on the T1-weighted MRI images of 18 NPC patients. The experimental results demonstrate that the proposed MRF-RFS method outperforms the baseline methods and deep learning methods on the task of segmenting NPC images. Conclusion The proposed method could be effective in NPC diagnosis and useful for guiding radiation therapy.

Download Full-text

TEHRAN AIR POLLUTANTS PREDICTION BASED ON RANDOM FOREST FEATURE SELECTION METHOD

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w4-483-2017 ◽

2017 ◽

Vol XLII-4/W4 ◽

pp. 483-488 ◽

Cited By ~ 2

Author(s):

A. Shamsoddini ◽

M. R. Aboodi ◽

J. Karami

Keyword(s):

Neural Networks ◽

Air Pollution ◽

Feature Selection ◽

Artificial Neural Networks ◽

Random Forest ◽

Sulfur Dioxide ◽

Nitrogen Dioxide ◽

Feature Selection Method ◽

Selection Method ◽

The Other

Air pollution as one of the most serious forms of environmental pollutions poses huge threat to human life. Air pollution leads to environmental instability, and has harmful and undesirable effects on the environment. Modern prediction methods of the pollutant concentration are able to improve decision making and provide appropriate solutions. This study examines the performance of the Random Forest feature selection in combination with multiple-linear regression and Multilayer Perceptron Artificial Neural Networks methods, in order to achieve an efficient model to estimate carbon monoxide and nitrogen dioxide, sulfur dioxide and PM2.5 contents in the air. The results indicated that Artificial Neural Networks fed by the attributes selected by Random Forest feature selection method performed more accurate than other models for the modeling of all pollutants. The estimation accuracy of sulfur dioxide emissions was lower than the other air contaminants whereas the nitrogen dioxide was predicted more accurate than the other pollutants.

Download Full-text

Modify Random Forest Algorithm Using Hybrid Feature Selection Method

International Journal on Perceptive and Cognitive Computing ◽

10.31436/ijpcc.v4i2.59 ◽

2018 ◽

Vol 4 (2) ◽

pp. 1-6

Author(s):

Ahmed T. Sadiqâ€Ž ◽

Karrar Shareef Musawi

Keyword(s):

Feature Selection ◽

Random Forest ◽

Gini Index ◽

Feature Selection Method ◽

Selection Method ◽

Random Selection ◽

Experimental Results ◽

Random Forest Algorithm ◽

Selection For

The Importance of Random Forrest(RF) is one of the most powerful â€Žmethods â€Žof â€Žmachine learning in â€ŽDecision Tree.â€Ž The Proposed hybrid feature selection for Random Forest depend on â€Žtwo â€Žmeasure â€Žâ€ŽInformation Gain and Gini Index in varying percentages â€Žbased on â€Žweight.â€Ž In this paper, we tend to â€Žpropose a modify Random Forrestâ€ â€â€Žalgorithm named â€ŽRandom Forest algorithm using hybrid â€Žfeature â€Žâ€Žselection â€Žthat uses hybrid feature â€Žselection instead of â€Žusing â€Žone feature selection. The â€Žmain plan is to â€Žcomputation the â€Žâ€Ž Information â€ŽGain for all random selection â€Žfeature then search for â€Žthe best split â€Žâ€Žpoint in â€Žthe node that gives the best â€Žvalue for a hybrid â€Žequation with â€ŽGini Index. â€ŽThe experimental results on the â€Ždataset â€Žshowed that the proposed â€Žmodification is â€Žbetter than the classic Random â€ŽForest compared to â€Žthe standard static Random â€ŽForest the hybrid feature â€Žâ€Žselection Random Forrest shows significant â€Žimprovement â€Žin accuracy measure.â€Ž

Download Full-text

An Improved Feature Selection Method Based on Random Forest Algorithm for Wind Turbine Condition Monitoring

Sensors ◽

10.3390/s21165654 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5654

Author(s):

Guo Li ◽

Chensheng Wang ◽

Di Zhang ◽

Guang Yang

Keyword(s):

Feature Selection ◽

Random Forest ◽

Wind Turbine ◽

Condition Monitoring ◽

Supervisory Control ◽

Feature Selection Method ◽

Selection Method ◽

Random Forest Algorithm ◽

Gated Recurrent Unit ◽

Euclidian Distances

Feature selection and dimensionality reduction are important for the performance of wind turbine condition monitoring models using supervisory control and data acquisition (SCADA) data. In this paper, an improved random forest algorithm, namely Feature Simplification Random Forest (FS_RF), is proposed, which is capable of identifying features closely correlated with wind turbine working conditions. The Euclidian distances are employed to distinguish the weight of the same feature among different samples, and its importance is measured by means of the random forest algorithm. The selected features are finally verified by a two-layer gated recurrent unit (GRU) neural network facilitating condition monitoring. The experimental results demonstrate the capacity and effectiveness of the proposed method for wind turbine condition monitoring.

Download Full-text

Imbalanced Feature Selection with Random Forest for Ovarian Cancer Diagnosis and Survival Prediction

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2016.5714 ◽

2016 ◽

Vol 13 (10) ◽

pp. 7303-7309 ◽

Cited By ~ 1

Author(s):

Xiaonan Fang ◽

Cheng Liang ◽

Huaxiang Zhang

Keyword(s):

Ovarian Cancer ◽

Feature Selection ◽

Random Forest ◽

Cancer Diagnosis ◽

Reproductive System ◽

Feature Selection Method ◽

Selection Method ◽

Survival Prediction ◽

Female Reproductive System ◽

Early Cancer Diagnosis

Ovarian cancer is the most lethal cancer of female reproductive system. Although it only ranks tenth of female malignancy tumors, its death rate is the highest among the female reproductive system tumors. Therefore, there is a pressing and consistent need to better comprehend its pathogenesis. However, the early diagnosis and survival predictions of ovarian cancer patients still remains a challenging problem today. Microarray technology has been widely accepted in early cancer diagnosis and prediction of outcome. Nevertheless, the high-dimension and imbalanced class distribution always disturb the effect of classification. In this paper, we proposed a new imbalanced feature selection method based on Random Forest called IFSRF for ovarian cancer classification. Our method selects AUC as the evaluation criterion when performing feature selection, which can relieve the negative effect of imbalanced classes. We select three manually curated ovarian cancer datasets and five widely used classifiers to show the improvement after using IFSRF. Furthermore, to demonstrate the effectiveness of the proposed method, we compare IFSRF with another widely used feature selection method Relieff. Experiments results on three ovarian cancer diagnosis and survival prediction data sets show that our feature selection method can significantly improve the AUC performance of all classifiers, especially on Random Forest. Meanwhile, the overall prediction accuracy could maintain as well.

Download Full-text

A hybrid genetic-neural model for predicting protein structural classes

Biologia ◽

10.2478/s11756-009-0125-4 ◽

2009 ◽

Vol 64 (4) ◽

Author(s):

Samad Jahandideh ◽

Somayyeh Hoseini ◽

Mina Jahandideh ◽

Mohammad Davoodi

Keyword(s):

Neural Network ◽

Feature Selection ◽

Selection Procedure ◽

Neural Model ◽

Classification Performance ◽

Single Amino Acid ◽

Class Prediction ◽

Structural Class ◽

Protein Structural Class ◽

Structural Classes

AbstractA genetic algorithm (GA) for feature selection in conjunction with neural network was applied to predict protein structural classes based on single amino acid and all dipeptide composition frequencies. These sequence parameters were encoded as input features for a GA in feature selection procedure and classified with a three-layered neural network to predict protein structural classes. The system was established through optimization of the classification performance of neural network which was used as evaluation function. In this study, self-consistency and jackknife tests on a database containing 498 proteins were used to verify the performance of this hybrid method, and were compared with some of prior works. The adoption of a hybrid model, which encompasses genetic and neural technologies, demonstrated to be a promising approach in the task of protein structural class prediction.

Download Full-text