scholarly journals Brain Cancer Prediction Based on Novel Interpretable Ensemble Gene Selection Algorithm and Classifier

Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1936
Author(s):  
Abdulqader M. Almars ◽  
Majed Alwateer ◽  
Mohammed Qaraad ◽  
Souad Amjad ◽  
Hanaa Fathi ◽  
...  

The growth of abnormal cells in the brain causes human brain tumors. Identifying the type of tumor is crucial for the prognosis and treatment of the patient. Data from cancer microarrays typically include fewer samples with many gene expression levels as features, reflecting the curse of dimensionality and making classifying data from microarrays challenging. In most of the examined studies, cancer classification (Malignant and benign) accuracy was examined without disclosing biological information related to the classification process. A new approach was proposed to bridge the gap between cancer classification and the interpretation of the biological studies of the genes implicated in cancer. This study aims to develop a new hybrid model for cancer classification (by using feature selection mRMRe as a key step to improve the performance of classification methods and a distributed hyperparameter optimization for gradient boosting ensemble methods). To evaluate the proposed method, NB, RF, and SVM classifiers have been chosen. In terms of the AUC, sensitivity, and specificity, the optimized CatBoost classifier performed better than the optimized XGBoost in cross-validation 5, 6, 8, and 10. With an accuracy of 0.91±0.12, the optimized CatBoost classifier is more accurate than the CatBoost classifier without optimization, which is 0.81± 0.24. By using hybrid algorithms, SVM, RF, and NB automatically become more accurate. Furthermore, in terms of accuracy, SVM and RF (0.97±0.08) achieve equivalent and higher classification accuracy than NB (0.91±0.12). The findings of relevant biomedical studies confirm the findings of the selected genes.

Mathematics ◽  
2021 ◽  
Vol 9 (15) ◽  
pp. 1820
Author(s):  
Ekaterina V. Orlova

This research deals with the challenge of reducing banks’ credit risks associated with the insolvency of borrowing individuals. To solve this challenge, we propose a new approach, methodology and models for assessing individual creditworthiness, with additional data about borrowers’ digital footprints to implement comprehensive analysis and prediction of a borrower’s credit profile. We suggest a model for borrowers’ clustering based on the method of hierarchical clustering and the k-means method, which groups actual borrowers having similar creditworthiness and similar credit risks into homogeneous clusters. We also design the model for borrowers’ classification based on the stochastic gradient boosting (SGB) method, which reliably determines the cluster number and therefore the risk level for a new borrower. The developed models are the basis for decision making regarding the decision about lending value, interest rates and lending terms for each risk-homogeneous borrower’s group. The modified version of the methodology for assessing individual creditworthiness is presented, which is to reduce the credit risks and to increase the stability and profitability of financial organizations.


2021 ◽  
Vol 10 (1) ◽  
pp. 42
Author(s):  
Kieu Anh Nguyen ◽  
Walter Chen ◽  
Bor-Shiun Lin ◽  
Uma Seeboonruang

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.


Sign in / Sign up

Export Citation Format

Share Document