Mixture Network Regularized Generalized Linear Model with Feature Selection

Mapping Intimacies ◽

10.1101/678029 ◽

2019 ◽

Cited By ~ 1

Author(s):

Kaiqiao Li ◽

Xuefeng Wang ◽

Pei Fen Kuan

Keyword(s):

Network Structure ◽

Survival Data ◽

Statistical Models ◽

Gene Networks ◽

Prediction Models ◽

Statistical Prediction ◽

High Dimensional ◽

Operating Characteristics ◽

Proposed Model ◽

Response To Chemotherapy

AbstractHigh dimensional genomics data in biomedical sciences is an invaluable resource for constructing statistical prediction models. With the increasing knowledge of gene networks and pathways, this information can be utilized in the statistical models to improve prediction accuracy and enhance model interpretability. However, in some scenarios the network structure may only be partially known or inaccurately specified. Thus, the performance of statistical models incorporating such network structure may be compromised. In this paper, we proposed a weighted sparse network learning method by optimally combining a data driven network with sparsity property to a known or partially known prior network to address this issue. We showed that our proposed model attained the oracle property which aims to improve the accuracy of parameter estimation and achieved a parsimonious model in high dimensional setting for different outcomes including continuous, binary and survival data in extensive simulations studies. Case studies on ovarian cancer proteomics and melanoma gene expression further demonstrated that our proposed model achieved good operating characteristics in predicting response to chemotherapy and survival risk. An R package glmaag implemented our method is available on the Comprehensive R Archive Network (CRAN).

Download Full-text

Deep Learning-Based Survival Analysis for High-Dimensional Survival Data

Mathematics ◽

10.3390/math9111244 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1244

Author(s):

Lin Hao ◽

Juncheol Kim ◽

Sookhee Kwon ◽

Il Do Ha

Keyword(s):

Survival Data ◽

Prediction Models ◽

Prediction Performance ◽

Time Dependent ◽

Tuning Parameter ◽

High Dimensional ◽

Brier Score ◽

Survival Prediction ◽

Optimal Setting ◽

Selection Of

With the development of high-throughput technologies, more and more high-dimensional or ultra-high-dimensional genomic data are being generated. Therefore, effectively analyzing such data has become a significant challenge. Machine learning (ML) algorithms have been widely applied for modeling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model), which was built with Keras and TensorFlow, was developed. However, its results were only evaluated on the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluated the prediction performance of the DNNSurv model using ultra-high-dimensional and high-dimensional survival datasets and compared it with three popular ML survival prediction models (i.e., random survival forest and the Cox-based LASSO and Ridge models). For this purpose, we also present the optimal setting of several hyperparameters, including the selection of a tuning parameter. The proposed method demonstrated via data analysis that the DNNSurv model performed well overall as compared with the ML models, in terms of the three main evaluation measures (i.e., concordance index, time-dependent Brier score, and the time-dependent AUC) for survival prediction performance.

Download Full-text

Deep Learning-based Survival Analysis for High-dimensional Survival Data

10.20944/preprints202104.0529.v1 ◽

2021 ◽

Author(s):

Il Do Ha ◽

Lin Hao ◽

Juncheol Kim ◽

Sookhee Kwon

Keyword(s):

Survival Data ◽

Prediction Models ◽

Prediction Performance ◽

Time Dependent ◽

Tuning Parameter ◽

High Dimensional ◽

Brier Score ◽

Survival Prediction ◽

Optimal Setting ◽

Selection Of

As the development of high-throughput technologies, more and more high-dimensional or ultra high-dimensional genomic data are generated. Therefore, how to make effective analysis of such data becomes a challenge. Machine learning (ML) algorithms have been widely applied for modelling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, the multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model) , which was built with Keras and Tensorflow, was developed. However, its results were only evaluated to the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluate the prediction performance of the DNNSurv model using ultra high-dimensional and high-dimensional survival datasets, and compare it with three popular ML survival prediction models (i.e., random survival forest and Cox-based LASSO and Ridge models). For this purpose we also present the optimal setting of several hyper-parameters including selection of tuning parameter. The proposed method demonstrates via data analysis that the DNNSurv model performs overall well as compared with the ML models, in terms of three main evaluation measures (i.e., concordance index, time-dependent Brier score and time-dependent AUC) for survival prediction performance.

Download Full-text

Feature Screening for High-Dimensional Survival Data via Censored Quantile Correlation

Journal of Systems Science and Complexity ◽

10.1007/s11424-020-9295-5 ◽

2020 ◽

Author(s):

Kai Xu ◽

Xudong Huang

Keyword(s):

Survival Data ◽

High Dimensional ◽

Feature Screening

Download Full-text

Models for predicting treatment efficacy of antiepileptic drugs and prognosis of treatment withdrawal in epilepsy patients

Acta Epileptologica ◽

10.1186/s42494-020-00035-9 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Shijun Yang ◽

Bin Wang ◽

Xiong Han

Keyword(s):

Antiepileptic Drugs ◽

Statistical Models ◽

Prediction Models ◽

External Validation ◽

Machine Learning Algorithms ◽

Treatment Withdrawal ◽

Patient Treatment ◽

Recent Developments ◽

Progression Of Disease ◽

Patients With Epilepsy

AbstractAlthough antiepileptic drugs (AEDs) are the most effective treatment for epilepsy, 30–40% of patients with epilepsy would develop drug-refractory epilepsy. An accurate, preliminary prediction of the efficacy of AEDs has great clinical significance for patient treatment and prognosis. Some studies have developed statistical models and machine-learning algorithms (MLAs) to predict the efficacy of AEDs treatment and the progression of disease after treatment withdrawal, in order to provide assistance for making clinical decisions in the aim of precise, personalized treatment. The field of prediction models with statistical models and MLAs is attracting growing interest and is developing rapidly. What’s more, more and more studies focus on the external validation of the existing model. In this review, we will give a brief overview of recent developments in this discipline.

Download Full-text

Performance of the Surprise Question Compared to Prediction Models in Hemodialysis Patients: A Prospective Study

American Journal of Nephrology ◽

10.1159/000481920 ◽

2017 ◽

Vol 46 (5) ◽

pp. 390-396 ◽

Cited By ~ 4

Author(s):

Rakesh Malhotra ◽

Xia Tao ◽

Yuedong Wang ◽

Yuqi Chen ◽

Rebecca H. Apruzzese ◽

...

Keyword(s):

Linear Models ◽

Prediction Models ◽

Clinical Laboratory ◽

Mortality Prediction ◽

Operating Characteristics ◽

Dialysis Treatment ◽

Mortality Probability ◽

A Prospective Study ◽

Surprise Question

Background: The surprise question (SQ) (“Would you be surprised if this patient were still alive in 6 or 12 months?”) is used as a mortality prognostication tool in hemodialysis (HD) patients. We compared the performance of the SQ with that of prediction models (PMs) for 6- and 12-month mortality prediction. Methods: Demographic, clinical, laboratory, and dialysis treatment indicators were used to model 6- and 12-month mortality probability in a HD patients training cohort (n = 6,633) using generalized linear models (GLMs). A total of 10 nephrologists from 5 HD clinics responded to the SQ in 215 patients followed prospectively for 12 months. The performance of PM was evaluated in the validation (n = 6,634) and SQ cohorts (n = 215) using the areas under receiver operating characteristics curves. We compared sensitivities and specificities of PM and SQ. Results: The PM and SQ cohorts comprised 13,267 (mean age 61 years, 55% men, 54% whites) and 215 (mean age 62 years, 59% men, 50% whites) patients, respectively. During the 12-month follow-up, 1,313 patients died in the prediction model cohort and 22 in the SQ cohort. For 6-month mortality prediction, the GLM had areas under the curve of 0.77 in the validation cohort and 0.77 in the SQ cohort. As for 12-month mortality, areas under the curve were 0.77 and 0.80 in the validation and SQ cohorts, respectively. The 6- and 12-month PMs had sensitivities of 0.62 (95% CI 0.35–0.88) and 0.75 (95% CI 0.56–0.94), respectively. The 6- and 12-month SQ sensitivities were 0.23 (95% CI 0.002–0.46) and 0.35 (95% CI 0.14–0.56), respectively. Conclusion: PMs exhibit superior sensitivity compared to the SQ for mortality prognostication in HD patients.

Download Full-text

Incorporating pathway information into boosting estimation of high-dimensional risk prediction models

BMC Bioinformatics ◽

10.1186/1471-2105-10-18 ◽

2009 ◽

Vol 10 (1) ◽

Cited By ~ 45

Author(s):

Harald Binder ◽

Martin Schumacher

Keyword(s):

Risk Prediction ◽

Prediction Models ◽

High Dimensional ◽

Risk Prediction Models ◽

Pathway Information

Download Full-text

Estimating Housing Mortality with Standard Loss Curves

Environment and Planning A Economy and Space ◽

10.1068/a181521 ◽

1986 ◽

Vol 18 (11) ◽

pp. 1521-1530 ◽

Cited By ~ 4

Author(s):

M E Gleeson

Keyword(s):

Survival Data ◽

Statistical Models ◽

Life Tables ◽

Time To Failure ◽

Truncated Data ◽

Mobile Homes ◽

Analytical Work ◽

Tests Of Fit

Tests of fit using one set of data on mobile homes and another on conventional housing indicate that standard loss curves, such as the Pearl-Reed and Weibull curves, can be used to approximate housing survivorship functions. This finding opens up the possibility of analytical work using standard curves and the application of time-to-failure statistical models that are based on such curves. Tests of fit of standard curves to the two housing survivorship functions using truncated data are also encouraging, suggesting means of estimating housing mortality and computing life tables with incomplete cohort survival data.

Download Full-text

Machine Learning-based Prediction Models for Diagnosis and Prognosis in Inflammatory Bowel Diseases: A Systematic Review

Journal of Crohn s and Colitis ◽

10.1093/ecco-jcc/jjab155 ◽

2021 ◽

Author(s):

Nghia H Nguyen ◽

Dominic Picetti ◽

Parambir S Dulai ◽

Vipul Jairath ◽

William J Sandborn ◽

...

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Risk Prediction ◽

Statistical Models ◽

Prediction Models ◽

Risk Of Bias ◽

Learning Models ◽

Bowel Diseases ◽

Inflammatory Bowel ◽

Machine Learning Models

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.

Download Full-text

Predicting Corporate Financial Sustainability Using Novel Business Analytics

Sustainability ◽

10.3390/su11010064 ◽

2018 ◽

Vol 11 (1) ◽

pp. 64 ◽

Cited By ~ 5

Author(s):

Kyoung-jae Kim ◽

Kichun Lee ◽

Hyunchul Ahn

Keyword(s):

Financial Distress ◽

Prediction Accuracy ◽

Prediction Models ◽

Support Vector ◽

Model Parameters ◽

Financial Sustainability ◽

Business Analytics ◽

Financial Distress Prediction ◽

Proposed Model ◽

Distress Prediction

Measuring and managing the financial sustainability of the borrowers is crucial to financial institutions for their risk management. As a result, building an effective corporate financial distress prediction model has been an important research topic for a long time. Recently, researchers are exerting themselves to improve the accuracy of financial distress prediction models by applying various business analytics approaches including statistical and artificial intelligence methods. Among them, support vector machines (SVMs) are becoming popular. SVMs require only small training samples and have little possibility of overfitting if model parameters are properly tuned. Nonetheless, SVMs generally show high prediction accuracy since it can deal with complex nonlinear patterns. Despite of these advantages, SVMs are often criticized because their architectural factors are determined by heuristics, such as the parameters of a kernel function and the subsets of appropriate features and instances. In this study, we propose globally optimized SVMs, denoted by GOSVM, a novel hybrid SVM model designed to optimize feature selection, instance selection, and kernel parameters altogether. This study introduces genetic algorithm (GA) in order to simultaneously optimize multiple heterogeneous design factors of SVMs. Our study applies the proposed model to the real-world case for predicting financial distress. Experiments show that the proposed model significantly improves the prediction accuracy of conventional SVMs.

Download Full-text

Dynamics in high-dimensional model gene networks

Signal Processing ◽

10.1016/s0165-1684(02)00479-6 ◽

2003 ◽

Vol 83 (4) ◽

pp. 789-798 ◽

Cited By ~ 28

Author(s):

K. Kappler ◽

R. Edwards ◽

L. Glass

Keyword(s):

Gene Networks ◽

High Dimensional ◽

Dimensional Model

Download Full-text