Brain Cancer Prediction Based on Novel Interpretable Ensemble Gene Selection Algorithm and Classifier

Abdulqader M. Almars; Majed Alwateer; Mohammed Qaraad; Souad Amjad; Hanaa Fathi; Ayda K. Kelany; Nazar K. Hussein; Mostafa Elhosseini

doi:10.3390/diagnostics11101936

Brain Cancer Prediction Based on Novel Interpretable Ensemble Gene Selection Algorithm and Classifier

Diagnostics ◽

10.3390/diagnostics11101936 ◽

2021 ◽

Vol 11 (10) ◽

pp. 1936

Author(s):

Abdulqader M. Almars ◽

Majed Alwateer ◽

Mohammed Qaraad ◽

Souad Amjad ◽

Hanaa Fathi ◽

...

Keyword(s):

Gene Selection ◽

Ensemble Methods ◽

Cancer Classification ◽

Biological Information ◽

Gradient Boosting ◽

New Approach ◽

Biological Studies ◽

Abnormal Cells ◽

Gene Selection Algorithm ◽

Gene Expression Levels

The growth of abnormal cells in the brain causes human brain tumors. Identifying the type of tumor is crucial for the prognosis and treatment of the patient. Data from cancer microarrays typically include fewer samples with many gene expression levels as features, reflecting the curse of dimensionality and making classifying data from microarrays challenging. In most of the examined studies, cancer classification (Malignant and benign) accuracy was examined without disclosing biological information related to the classification process. A new approach was proposed to bridge the gap between cancer classification and the interpretation of the biological studies of the genes implicated in cancer. This study aims to develop a new hybrid model for cancer classification (by using feature selection mRMRe as a key step to improve the performance of classification methods and a distributed hyperparameter optimization for gradient boosting ensemble methods). To evaluate the proposed method, NB, RF, and SVM classifiers have been chosen. In terms of the AUC, sensitivity, and specificity, the optimized CatBoost classifier performed better than the optimized XGBoost in cross-validation 5, 6, 8, and 10. With an accuracy of 0.91±0.12, the optimized CatBoost classifier is more accurate than the CatBoost classifier without optimization, which is 0.81± 0.24. By using hybrid algorithms, SVM, RF, and NB automatically become more accurate. Furthermore, in terms of accuracy, SVM and RF (0.97±0.08) achieve equivalent and higher classification accuracy than NB (0.91±0.12). The findings of relevant biomedical studies confirm the findings of the selected genes.

Download Full-text

A novel gene selection algorithm for cancer classification using microarray datasets

BMC Medical Genomics ◽

10.1186/s12920-018-0447-6 ◽

2019 ◽

Vol 12 (1) ◽

Cited By ~ 10

Author(s):

Russul Alanni ◽

Jingyu Hou ◽

Hasseeb Azzawi ◽

Yong Xiang

Keyword(s):

Gene Selection ◽

Cancer Classification ◽

Selection Algorithm ◽

Novel Gene ◽

Microarray Datasets ◽

Gene Selection Algorithm

Download Full-text

A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification

PLoS ONE ◽

10.1371/journal.pone.0212333 ◽

2019 ◽

Vol 14 (2) ◽

pp. e0212333 ◽

Cited By ~ 5

Author(s):

Songyot Nakariyakul

Keyword(s):

Gene Selection ◽

Cancer Classification ◽

Selection Algorithm ◽

Hybrid Gene ◽

Interaction Information ◽

Gene Selection Algorithm

Download Full-text

FF-SVM: New FireFly-based Gene Selection Algorithm for Microarray Cancer Classification

2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) ◽

10.1109/cibcb.2019.8791236 ◽

2019 ◽

Cited By ~ 1

Author(s):

Nada Almugren ◽

Hala Alshamlan

Keyword(s):

Gene Selection ◽

Cancer Classification ◽

Selection Algorithm ◽

Gene Selection Algorithm

Download Full-text

Granular SVM-RFE Gene Selection Algorithm for Reliable Prostate Cancer Classification on Microarray Expression Data

Fifth IEEE Symposium on Bioinformatics and Bioengineering (BIBE'05) ◽

10.1109/bibe.2005.34 ◽

2006 ◽

Cited By ~ 8

Author(s):

Yuchun Tang ◽

Yan-Qing Zhang ◽

Zhen Huang ◽

Xiaohua Hu

Keyword(s):

Prostate Cancer ◽

Gene Selection ◽

Cancer Classification ◽

Expression Data ◽

Selection Algorithm ◽

Microarray Expression Data ◽

Microarray Expression ◽

Gene Selection Algorithm ◽

Prostate Cancer Classification

Download Full-text

A hybrid gene selection algorithm for microarray cancer classification using genetic algorithm and learning automata

Informatics in Medicine Unlocked ◽

10.1016/j.imu.2017.10.004 ◽

2017 ◽

Vol 9 ◽

pp. 246-254 ◽

Cited By ~ 42

Author(s):

Habib Motieghader ◽

Ali Najafi ◽

Balal Sadeghi ◽

Ali Masoudi-Nejad

Keyword(s):

Genetic Algorithm ◽

Gene Selection ◽

Learning Automata ◽

Cancer Classification ◽

Selection Algorithm ◽

Hybrid Gene ◽

Gene Selection Algorithm

Download Full-text

An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data

Bioinformatics ◽

10.1093/bioinformatics/bts602 ◽

2012 ◽

Vol 28 (24) ◽

pp. 3306-3315 ◽

Cited By ~ 45

Author(s):

Y. Piao ◽

M. Piao ◽

K. Park ◽

K. H. Ryu

Keyword(s):

Gene Expression ◽

Gene Expression Data ◽

Gene Selection ◽

Cancer Classification ◽

Expression Data ◽

Selection Algorithm ◽

Gene Selection Algorithm ◽

Ensemble Correlation

Download Full-text

An Improved Elastic Net for Cancer Classification and Gene Selection

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2010.00976 ◽

2010 ◽

Vol 36 (7) ◽

pp. 976-981 ◽

Cited By ~ 8

Author(s):

Jun-Tao LI ◽

Ying-Min JIA

Keyword(s):

Gene Selection ◽

Elastic Net ◽

Cancer Classification

Download Full-text

Gene Selection for Cancer Classification using a New Hybrid of Binary Black Hole Algorithm

2020 28th Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu49456.2020.9302351 ◽

2020 ◽

Author(s):

Elnaz Pashaei ◽

Elham Pashaei

Keyword(s):

Black Hole ◽

Gene Selection ◽

Cancer Classification ◽

Binary Black Hole ◽

Selection For ◽

Black Hole Algorithm

Download Full-text

Methodology and Models for Individuals’ Creditworthiness Management Using Digital Footprint Data and Machine Learning Methods

Mathematics ◽

10.3390/math9151820 ◽

2021 ◽

Vol 9 (15) ◽

pp. 1820

Author(s):

Ekaterina V. Orlova

Keyword(s):

Interest Rates ◽

Additional Data ◽

Risk Level ◽

Gradient Boosting ◽

Cluster Number ◽

New Approach ◽

Credit Risks ◽

Stochastic Gradient Boosting ◽

The Stability ◽

Digital Footprints

This research deals with the challenge of reducing banks’ credit risks associated with the insolvency of borrowing individuals. To solve this challenge, we propose a new approach, methodology and models for assessing individual creditworthiness, with additional data about borrowers’ digital footprints to implement comprehensive analysis and prediction of a borrower’s credit profile. We suggest a model for borrowers’ clustering based on the method of hierarchical clustering and the k-means method, which groups actual borrowers having similar creditworthiness and similar credit risks into homogeneous clusters. We also design the model for borrowers’ classification based on the stochastic gradient boosting (SGB) method, which reliably determines the cluster number and therefore the risk level for a new borrower. The developed models are the basis for decision making regarding the decision about lending value, interest rates and lending terms for each risk-homogeneous borrower’s group. The modified version of the methodology for assessing individual creditworthiness is presented, which is to reduce the credit risks and to increase the stability and profitability of financial organizations.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text