Tumor Grade and Overall Survival Prediction of Gliomas Using Radiomics

Scientific Programming ◽

10.1155/2021/9913466 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jianming Ye ◽

He Huang ◽

Weiwei Jiang ◽

Xiaomei Xu ◽

Chun Xie ◽

...

Keyword(s):

Machine Learning ◽

Overall Survival ◽

Feature Selection ◽

Model Performance ◽

Predictive Performance ◽

Tumor Grade ◽

Survival Prediction ◽

Lower Grade ◽

Feature Selection Technique ◽

Mri Scans

Glioma is one of the most common and deadly malignant brain tumors originating from glial cells. For personalized treatment, an accurate preoperative prognosis for glioma patients is highly desired. Recently, various machine learning-based approaches have been developed to predict the prognosis based on preoperative magnetic resonance imaging (MRI) radiomics, which extract quantitative features from radiographic images. However, major challenges remain for methodologic developments to optimize feature extraction and provide rapid information flow in clinical settings. This study investigates two machine learning-based prognosis prediction tasks using radiomic features extracted from preoperative multimodal MRI brain data: (i) prediction of tumor grade (higher-grade vs. lower-grade gliomas) from preoperative MRI scans and (ii) prediction of patient overall survival (OS) in higher-grade gliomas (<12 months vs. > 12 months) from preoperative MRI scans. Specifically, these two tasks utilize the conventional machine learning-based models built with various classifiers. Moreover, feature selection methods are applied to increase model performance and decrease computational costs. In the experiments, models are evaluated in terms of their predictive performance and stability using a bootstrap approach. Experimental results show that classifier choice and feature selection technique plays a significant role in model performance and stability for both tasks; a variability analysis indicates that classification method choice is the most dominant source of performance variation for both tasks.

Download Full-text

Developing Machine Learning Models to Predict Roadway Traffic Noise: An Opportunity to Escape Conventional Techniques

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/0361198119838514 ◽

2019 ◽

Vol 2673 (4) ◽

pp. 158-172 ◽

Cited By ~ 1

Author(s):

Mohamad Ali Khalil ◽

Khaled Hamad ◽

Abdallah Shanableh

Keyword(s):

Machine Learning ◽

Feature Selection ◽

United Arab Emirates ◽

Traffic Noise ◽

Free Field ◽

Model Performance ◽

Noise Model ◽

Support Vector ◽

Feature Selection Technique ◽

Performance Results

Accurate prediction of roadway traffic noise remains challenging. Many researchers continue to improve the performance of their models by either adding more variables or improving their modeling algorithms. In this research, machine learning (ML) modeling techniques were developed to predict roadway traffic noise accurately. The ML techniques applied were: regression decision trees, support vector machine, ensembles, and artificial neural network. The parameters of each of these models were fine-tuned to achieve the best performance results. In addition, a state-of-the-art hybrid feature-selection technique has been employed to select a minimum set of input features (variables) while maintaining the accuracy of the developed models. By optimizing the number of features used in the model, the resources needed to develop and utilize a model to predict roadway noise would be less, hence decreasing the development cost. The proposed approach has been applied to develop a free-field roadway traffic noise model for Sharjah City in the United Arab Emirates. The best developed ML model was compared with a conventional regression model which was developed earlier under the same conditions. The cross-validated results clearly indicate that the best ML model outperformed the regression modeling. The performance of the ML model was also assessed after reducing the number of its input features based on the outcome of the feature-selection algorithm; the model performance was slightly affected. This result emphasizes the importance of considering only features that greatly influence the roadway traffic noise.

Download Full-text

Radiogenomic modeling predicts survival-associated prognostic groups in glioblastoma

Neuro-Oncology Advances ◽

10.1093/noajnl/vdab004 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Nicholas Nuechterlein ◽

Beibin Li ◽

Abdullah Feroze ◽

Eric C Holland ◽

Linda Shapiro ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Molecular Subtypes ◽

Feature Selection Method ◽

Area Under The Curve ◽

Selection Method ◽

Recursive Feature Elimination ◽

Signal Abnormality ◽

Mri Features ◽

Mri Scans

Abstract Background Combined whole-exome sequencing (WES) and somatic copy number alteration (SCNA) information can separate isocitrate dehydrogenase (IDH)1/2-wildtype glioblastoma into two prognostic molecular subtypes, which cannot be distinguished by epigenetic or clinical features. The potential for radiographic features to discriminate between these molecular subtypes has yet to be established. Methods Radiologic features (n = 35 340) were extracted from 46 multisequence, pre-operative magnetic resonance imaging (MRI) scans of IDH1/2-wildtype glioblastoma patients from The Cancer Imaging Archive (TCIA), all of whom have corresponding WES/SCNA data. We developed a novel feature selection method that leverages the structure of extracted MRI features to mitigate the dimensionality challenge posed by the disparity between a large number of features and the limited patients in our cohort. Six traditional machine learning classifiers were trained to distinguish molecular subtypes using our feature selection method, which was compared to least absolute shrinkage and selection operator (LASSO) feature selection, recursive feature elimination, and variance thresholding. Results We were able to classify glioblastomas into two prognostic subgroups with a cross-validated area under the curve score of 0.80 (±0.03) using ridge logistic regression on the 15-dimensional principle component analysis (PCA) embedding of the features selected by our novel feature selection method. An interrogation of the selected features suggested that features describing contours in the T2 signal abnormality region on the T2-weighted fluid-attenuated inversion recovery (FLAIR) MRI sequence may best distinguish these two groups from one another. Conclusions We successfully trained a machine learning model that allows for relevant targeted feature extraction from standard MRI to accurately predict molecularly-defined risk-stratifying IDH1/2-wildtype glioblastoma patient groups.

Download Full-text

Identification of 6 gene markers for survival prediction in osteosarcoma cases based on multi-omics analysis

Experimental Biology and Medicine ◽

10.1177/1535370221992015 ◽

2021 ◽

pp. 153537022199201

Author(s):

Runmin Li ◽

Guosheng Wang ◽

ZhouJie Wu ◽

HuaGuang Lu ◽

Gen Li ◽

...

Keyword(s):

Feature Selection ◽

Cox Regression ◽

Cancer Genomics ◽

Predictive Performance ◽

Training Group ◽

Gene Expression Omnibus ◽

Survival Prediction ◽

Screening Process ◽

Gene Markers ◽

Validation Set

Multiple-omics sequencing information with high-throughput has laid a solid foundation to identify genes associated with cancer prognostic process. Multiomics information study is capable of revealing the cancer occurring and developing system according to several aspects. Currently, the prognosis of osteosarcoma is still poor, so a genetic marker is needed for predicting the clinically related overall survival result. First, Office of Cancer Genomics (OCG Target) provided RNASeq, copy amount variations information, and clinically related follow-up data. Genes associated with prognostic process and genes exhibiting copy amount difference were screened in the training group, and the mentioned genes were integrated for feature selection with least absolute shrinkage and selection operator (Lasso). Eventually, effective biomarkers received the screening process. Lastly, this study built and demonstrated one gene-associated prognosis mode according to the set of the test and gene expression omnibus validation set; 512 prognosis-related genes ( P < 0.01), 336 copies of amplified genes ( P < 0.05), and 36 copies of deleted genes ( P < 0.05) were obtained, and those genes of the mentioned genomic variants display close associations with tumor occurring and developing mechanisms. This study generated 10 genes for candidates through the integration of genomic variant genes as well as prognosis-related genes. Six typical genes (i.e. MYC, CHIC2, CCDC152, LYL1, GPR142, and MMP27) were obtained by Lasso feature selection and stepwise multivariate regression study, many of which are reported to show a relationship to tumor progressing process. The authors conducted Cox regression study for building 6-gene sign, i.e. one single prognosis-related element, in terms of cases carrying osteosarcoma. In addition, the samples were able to be risk stratified in the training group, test set, and externally validating set. The AUC of five-year survival according to the training group and validation set reached over 0.85, with superior predictive performance as opposed to the existing researches. Here, 6-gene sign was built to be new prognosis-related marking elements for assessing osteosarcoma cases’ surviving state.

Download Full-text

FSDroid:- A feature selection technique to detect malware from Android using Machine Learning Techniques

Multimedia Tools and Applications ◽

10.1007/s11042-020-10367-w ◽

2021 ◽

Author(s):

Arvind Mahindru ◽

A.L. Sangal

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Machine Learning Techniques ◽

Feature Selection Technique ◽

Selection Technique ◽

Learning Techniques

Download Full-text

Machine learning-based feature importance approach for sensitivity analysis of steel frames

10.31224/osf.io/mvkf3 ◽

2021 ◽

Author(s):

Hyeyoung Koh ◽

Hannah Beth Blum

Keyword(s):

Machine Learning ◽

Sensitivity Analysis ◽

Feature Selection ◽

Large Scale ◽

Failure Modes ◽

Model Development ◽

Predictive Performance ◽

Computational Effort ◽

Structural Systems ◽

Feature Importance

This study presents a machine learning-based approach for sensitivity analysis to examine how parameters affect a given structural response while accounting for uncertainty. Reliability-based sensitivity analysis involves repeated evaluations of the performance function incorporating uncertainties to estimate the influence of a model parameter, which can lead to prohibitive computational costs. This challenge is exacerbated for large-scale engineering problems which often carry a large quantity of uncertain parameters. The proposed approach is based on feature selection algorithms that rank feature importance and remove redundant predictors during model development which improve model generality and training performance by focusing only on the significant features. The approach allows performing sensitivity analysis of structural systems by providing feature rankings with reduced computational effort. The proposed approach is demonstrated with two designs of a two-bay, two-story planar steel frame with different failure modes: inelastic instability of a single member and progressive yielding. The feature variables in the data are uncertainties including material yield strength, Young’s modulus, frame sway imperfection, and residual stress. The Monte Carlo sampling method is utilized to generate random realizations of the frames from published distributions of the feature parameters, and the response variable is the frame ultimate strength obtained from finite element analyses. Decision trees are trained to identify important features. Feature rankings are derived by four feature selection techniques including impurity-based, permutation, SHAP, and Spearman's correlation. Predictive performance of the model including the important features are discussed using the evaluation metric for imbalanced datasets, Matthews correlation coefficient. Finally, the results are compared with those from reliability-based sensitivity analysis on the same example frames to show the validity of the feature selection approach. As the proposed machine learning-based approach produces the same results as the reliability-based sensitivity analysis with improved computational efficiency and accuracy, it could be extended to other structural systems.

Download Full-text

Abstract 16588: Using Machine Learning to Improve Survival Prediction After Heart Transplantation

Circulation ◽

10.1161/circ.142.suppl_3.16588 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Brian Ayers ◽

Toumas Sandhold ◽

Igor Gosev ◽

Sunil Prasad ◽

Arman Kilic

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Heart Transplantation ◽

Risk Prediction ◽

Predictive Analytics ◽

Predictive Performance ◽

Orthotopic Heart Transplantation ◽

Survival Prediction ◽

One Year ◽

Testing Set

Introduction: Prior risk models for predicting survival after orthotopic heart transplantation (OHT) have displayed only modest discriminatory capability. With increasing interest in the application of machine learning (ML) to predictive analytics in clinical medicine, this study aimed to evaluate whether modern ML techniques could improve risk prediction in OHT. Methods: Data from the United Network for Organ Sharing registry was collected for all adult patients that underwent OHT from 2000 through 2019. The primary outcome was one-year post-transplant mortality. Dimensionality reduction and data re-sampling were employed during training. The final ensemble model was created from 100 different models of each algorithm: deep neural network, logistic regression, adaboost, and random forest. Discriminatory capability was assessed using area under receiver-operating-characteristic curve (AUROC), net reclassification index (NRI), and decision curve analysis (DCA). Results: Of the 33,657 study patients, 26,926 (80%) were randomly selected for the training set and 6,731 (20%) as a separate testing set. One-year mortality was balanced between cohorts (11.0% vs 11.3%). The optimal model performance was a final ensemble ML model. This model demonstrated an improved AUROC of 0.764 (95% CI, 0.745-0.782) in the testing set as compared to the other models (Figure). Additionally, the final model demonstrated an improvement of 72.9% ±3.8% (p<0.001) in predictive performance as assessed by NRI compared to logistic regression. The DCA showed the final ensemble method improved risk prediction across the entire spectrum of predicted risk as compared to all other models (p<0.001). Conclusions: An ensemble ML model was able to achieve greater predictive performance as compared to individual ML models as well as logistic regression for predicting survival after OHT. This analysis demonstrates the promise of ML techniques in risk prediction in OHT.

Download Full-text

A combined strategy of feature selection and machine learning to identify predictors of prediabetes

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocz204 ◽

2019 ◽

Vol 27 (3) ◽

pp. 396-406 ◽

Cited By ~ 1

Author(s):

Kushan De Silva ◽

Daniel Jönsson ◽

Ryan T Demmer

Keyword(s):

Machine Learning ◽

Feature Selection ◽

National Health ◽

Screening Tool ◽

Model Performance ◽

Nutrition Examination Survey ◽

Validation Data ◽

Internal Validation ◽

Health And Nutrition ◽

Wide Range

Abstract Objective To identify predictors of prediabetes using feature selection and machine learning on a nationally representative sample of the US population. Materials and Methods We analyzed n = 6346 men and women enrolled in the National Health and Nutrition Examination Survey 2013–2014. Prediabetes was defined using American Diabetes Association guidelines. The sample was randomly partitioned to training (n = 3174) and internal validation (n = 3172) sets. Feature selection algorithms were run on training data containing 156 preselected exposure variables. Four machine learning algorithms were applied on 46 exposure variables in original and resampled training datasets built using 4 resampling methods. Predictive models were tested on internal validation data (n = 3172) and external validation data (n = 3000) prepared from National Health and Nutrition Examination Survey 2011–2012. Model performance was evaluated using area under the receiver operating characteristic curve (AUROC). Predictors were assessed by odds ratios in logistic models and variable importance in others. The Centers for Disease Control (CDC) prediabetes screening tool was the benchmark to compare model performance. Results Prediabetes prevalence was 23.43%. The CDC prediabetes screening tool produced 64.40% AUROC. Seven optimal (≥ 70% AUROC) models identified 25 predictors including 4 potentially novel associations; 20 by both logistic and other nonlinear/ensemble models and 5 solely by the latter. All optimal models outperformed the CDC prediabetes screening tool (P < 0.05). Discussion Combined use of feature selection and machine learning increased predictive performance outperforming the recommended screening tool. A range of predictors of prediabetes was identified. Conclusion This work demonstrated the value of combining feature selection with machine learning to identify a wide range of predictors that could enhance prediabetes prediction and clinical decision-making.

Download Full-text

Machine Learning Models of Survival Prediction in Trauma Patients

Journal of Clinical Medicine ◽

10.3390/jcm8060799 ◽

2019 ◽

Vol 8 (6) ◽

pp. 799 ◽

Cited By ~ 7

Author(s):

Cheng-Shyuan Rau ◽

Shao-Chun Wu ◽

Jung-Fang Chuang ◽

Chun-Ying Huang ◽

Hang-Tsung Liu ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Predictive Performance ◽

Original Data ◽

High Accuracy ◽

Validation Dataset ◽

Survival Prediction ◽

Trauma Patients ◽

Data Set ◽

Test Dataset

Background: We aimed to build a model using machine learning for the prediction of survival in trauma patients and compared these model predictions to those predicted by the most commonly used algorithm, the Trauma and Injury Severity Score (TRISS). Methods: Enrolled hospitalized trauma patients from 2009 to 2016 were divided into a training dataset (70% of the original data set) for generation of a plausible model under supervised classification, and a test dataset (30% of the original data set) to test the performance of the model. The training and test datasets comprised 13,208 (12,871 survival and 337 mortality) and 5603 (5473 survival and 130 mortality) patients, respectively. With the provision of additional information such as pre-existing comorbidity status or laboratory data, logistic regression (LR), support vector machine (SVM), and neural network (NN) (with the Stuttgart Neural Network Simulator (RSNNS)) were used to build models of survival prediction and compared to the predictive performance of TRISS. Predictive performance was evaluated by accuracy, sensitivity, and specificity, as well as by area under the curve (AUC) measures of receiver operating characteristic curves. Results: In the validation dataset, NN and the TRISS presented the highest score (82.0%) for balanced accuracy, followed by SVM (75.2%) and LR (71.8%) models. In the test dataset, NN had the highest balanced accuracy (75.1%), followed by the TRISS (70.2%), SVM (70.6%), and LR (68.9%) models. All four models (LR, SVM, NN, and TRISS) exhibited a high accuracy of more than 97.5% and a sensitivity of more than 98.6%. However, NN exhibited the highest specificity (51.5%), followed by the TRISS (41.5%), SVM (40.8%), and LR (38.5%) models. Conclusions: These four models (LR, SVM, NN, and TRISS) exhibited a similar high accuracy and sensitivity in predicting the survival of the trauma patients. In the test dataset, the NN model had the highest balanced accuracy and predictive specificity.

Download Full-text

Feature Entropy Estimation (FEE) for Malicious IoT Traffic and Detection Using Machine Learning

Mobile Information Systems ◽

10.1155/2021/8091363 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Tarun Dhar Diwan ◽

Siddartha Choubey ◽

H. S. Hota ◽

S. B Goyal ◽

Sajjad Shaukat Jamal ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Low Cost ◽

Pearson Correlation ◽

Low Complexity ◽

Computational Time ◽

Chi Square ◽

Feature Selection Technique ◽

Detection Techniques ◽

Entropy Estimation

Identification of anomaly and malicious traffic in the Internet of things (IoT) network is essential for IoT security. Tracking and blocking unwanted traffic flows in the IoT network is required to design a framework for the identification of attacks more accurately, quickly, and with less complexity. Many machine learning (ML) algorithms proved their efficiency to detect intrusion in IoT networks. But this ML algorithm suffers many misclassification problems due to inappropriate and irrelevant feature size. In this paper, an in-depth study is presented to address such issues. We have presented lightweight low-cost feature selection IoT intrusion detection techniques with low complexity and high accuracy due to their low computational time. A novel feature selection technique was proposed with the integration of rank-based chi-square, Pearson correlation, and score correlation to extract relevant features out of all available features from the dataset. Then, feature entropy estimation was applied to validate the relationship among all extracted features to identify malicious traffic in IoT networks. Finally, an extreme gradient ensemble boosting approach was used to classify the features in relevant attack types. The simulation is performed on three datasets, i.e., NSL-KDD, USNW-NB15, and CCIDS2017, and results are presented on different test sets. It was observed that on the NSL-KDD dataset, accuracy was approx. 97.48%. Similarly, the accuracy of USNW-NB15 and CCIDS2017 was approx. 99.96% and 99.93%, respectively. Along with that, state-of-the-art comparison is also presented with existing techniques.

Download Full-text

Consensus of Feature Selection Methods and Reduced Generalization Gap Model to Improve Diagnosis of Heart Disease

Journal of Scientific Research ◽

10.3329/jsr.v13i3.53290 ◽

2021 ◽

Vol 13 (3) ◽

pp. 901-913

Author(s):

S. Gupta ◽

R. R. Sedamkar

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Heart Disease ◽

Missing Values ◽

Performance Metrics ◽

Model Performance ◽

Regression Tree ◽

Classification And Regression Tree ◽

Proposed Model ◽

Time Required

Enhancing the diagnostic ability of Machine Learning models for acceptable prediction in the healthcare community is still a concern. There are critical care disease datasets available online on which researchers have experimented with a different number of instances and features for similar disease prediction. Further, different Machine Learning (ML) models have different preprocessing requirements. Framingham heart disease data is multicollinear and has missing values. Thus, the proposed model aims to explore the differential preprocessing needs of ML models followed by feature selection in consensus with domain experts and feature extraction to resolve multicollinearity issues. Missing values have been imputed differently for each feature. The work also identifies optimal train set size by plotting a learning curve that provides a minimum generalization gap. When testing is done on this hyperparameter tuned model, performance is enhanced with respect to the F score weighted by support and stratification since the data is imbalanced. Experimental results demonstrate improvement in performance metrics, i.e., weighted F score, precision, recall, accuracy up to 3 %, and F1 score by 8 % for Logistic Regression Classifier with the proposed model. Further, the time required for hyperparameter tuning is reduced by 50% for tree-based models, particularly Classification and Regression Tree (CART).

Download Full-text