scholarly journals Mixture Network Regularized Generalized Linear Model with Feature Selection

2019 ◽  
Author(s):  
Kaiqiao Li ◽  
Xuefeng Wang ◽  
Pei Fen Kuan

AbstractHigh dimensional genomics data in biomedical sciences is an invaluable resource for constructing statistical prediction models. With the increasing knowledge of gene networks and pathways, this information can be utilized in the statistical models to improve prediction accuracy and enhance model interpretability. However, in some scenarios the network structure may only be partially known or inaccurately specified. Thus, the performance of statistical models incorporating such network structure may be compromised. In this paper, we proposed a weighted sparse network learning method by optimally combining a data driven network with sparsity property to a known or partially known prior network to address this issue. We showed that our proposed model attained the oracle property which aims to improve the accuracy of parameter estimation and achieved a parsimonious model in high dimensional setting for different outcomes including continuous, binary and survival data in extensive simulations studies. Case studies on ovarian cancer proteomics and melanoma gene expression further demonstrated that our proposed model achieved good operating characteristics in predicting response to chemotherapy and survival risk. An R package glmaag implemented our method is available on the Comprehensive R Archive Network (CRAN).

Mathematics ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 1244
Author(s):  
Lin Hao ◽  
Juncheol Kim ◽  
Sookhee Kwon ◽  
Il Do Ha

With the development of high-throughput technologies, more and more high-dimensional or ultra-high-dimensional genomic data are being generated. Therefore, effectively analyzing such data has become a significant challenge. Machine learning (ML) algorithms have been widely applied for modeling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model), which was built with Keras and TensorFlow, was developed. However, its results were only evaluated on the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluated the prediction performance of the DNNSurv model using ultra-high-dimensional and high-dimensional survival datasets and compared it with three popular ML survival prediction models (i.e., random survival forest and the Cox-based LASSO and Ridge models). For this purpose, we also present the optimal setting of several hyperparameters, including the selection of a tuning parameter. The proposed method demonstrated via data analysis that the DNNSurv model performed well overall as compared with the ML models, in terms of the three main evaluation measures (i.e., concordance index, time-dependent Brier score, and the time-dependent AUC) for survival prediction performance.


Author(s):  
Il Do Ha ◽  
Lin Hao ◽  
Juncheol Kim ◽  
Sookhee Kwon

As the development of high-throughput technologies, more and more high-dimensional or ultra high-dimensional genomic data are generated. Therefore, how to make effective analysis of such data becomes a challenge. Machine learning (ML) algorithms have been widely applied for modelling nonlinear and complicated interactions in a variety of practical fields such as high-dimensional survival data. Recently, the multilayer deep neural network (DNN) models have made remarkable achievements. Thus, a Cox-based DNN prediction survival model (DNNSurv model) , which was built with Keras and Tensorflow, was developed. However, its results were only evaluated to the survival datasets with high-dimensional or large sample sizes. In this paper, we evaluate the prediction performance of the DNNSurv model using ultra high-dimensional and high-dimensional survival datasets, and compare it with three popular ML survival prediction models (i.e., random survival forest and Cox-based LASSO and Ridge models). For this purpose we also present the optimal setting of several hyper-parameters including selection of tuning parameter. The proposed method demonstrates via data analysis that the DNNSurv model performs overall well as compared with the ML models, in terms of three main evaluation measures (i.e., concordance index, time-dependent Brier score and time-dependent AUC) for survival prediction performance.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Shijun Yang ◽  
Bin Wang ◽  
Xiong Han

AbstractAlthough antiepileptic drugs (AEDs) are the most effective treatment for epilepsy, 30–40% of patients with epilepsy would develop drug-refractory epilepsy. An accurate, preliminary prediction of the efficacy of AEDs has great clinical significance for patient treatment and prognosis. Some studies have developed statistical models and machine-learning algorithms (MLAs) to predict the efficacy of AEDs treatment and the progression of disease after treatment withdrawal, in order to provide assistance for making clinical decisions in the aim of precise, personalized treatment. The field of prediction models with statistical models and MLAs is attracting growing interest and is developing rapidly. What’s more, more and more studies focus on the external validation of the existing model. In this review, we will give a brief overview of recent developments in this discipline.


2017 ◽  
Vol 46 (5) ◽  
pp. 390-396 ◽  
Author(s):  
Rakesh Malhotra ◽  
Xia Tao ◽  
Yuedong Wang ◽  
Yuqi Chen ◽  
Rebecca H. Apruzzese ◽  
...  

Background: The surprise question (SQ) (“Would you be surprised if this patient were still alive in 6 or 12 months?”) is used as a mortality prognostication tool in hemodialysis (HD) patients. We compared the performance of the SQ with that of prediction models (PMs) for 6- and 12-month mortality prediction. Methods: Demographic, clinical, laboratory, and dialysis treatment indicators were used to model 6- and 12-month mortality probability in a HD patients training cohort (n = 6,633) using generalized linear models (GLMs). A total of 10 nephrologists from 5 HD clinics responded to the SQ in 215 patients followed prospectively for 12 months. The performance of PM was evaluated in the validation (n = 6,634) and SQ cohorts (n = 215) using the areas under receiver operating characteristics curves. We compared sensitivities and specificities of PM and SQ. Results: The PM and SQ cohorts comprised 13,267 (mean age 61 years, 55% men, 54% whites) and 215 (mean age 62 years, 59% men, 50% whites) patients, respectively. During the 12-month follow-up, 1,313 patients died in the prediction model cohort and 22 in the SQ cohort. For 6-month mortality prediction, the GLM had areas under the curve of 0.77 in the validation cohort and 0.77 in the SQ cohort. As for 12-month mortality, areas under the curve were 0.77 and 0.80 in the validation and SQ cohorts, respectively. The 6- and 12-month PMs had sensitivities of 0.62 (95% CI 0.35–0.88) and 0.75 (95% CI 0.56–0.94), respectively. The 6- and 12-month SQ sensitivities were 0.23 (95% CI 0.002–0.46) and 0.35 (95% CI 0.14–0.56), respectively. Conclusion: PMs exhibit superior sensitivity compared to the SQ for mortality prognostication in HD patients.


1986 ◽  
Vol 18 (11) ◽  
pp. 1521-1530 ◽  
Author(s):  
M E Gleeson

Tests of fit using one set of data on mobile homes and another on conventional housing indicate that standard loss curves, such as the Pearl-Reed and Weibull curves, can be used to approximate housing survivorship functions. This finding opens up the possibility of analytical work using standard curves and the application of time-to-failure statistical models that are based on such curves. Tests of fit of standard curves to the two housing survivorship functions using truncated data are also encouraging, suggesting means of estimating housing mortality and computing life tables with incomplete cohort survival data.


Author(s):  
Nghia H Nguyen ◽  
Dominic Picetti ◽  
Parambir S Dulai ◽  
Vipul Jairath ◽  
William J Sandborn ◽  
...  

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.


2018 ◽  
Vol 11 (1) ◽  
pp. 64 ◽  
Author(s):  
Kyoung-jae Kim ◽  
Kichun Lee ◽  
Hyunchul Ahn

Measuring and managing the financial sustainability of the borrowers is crucial to financial institutions for their risk management. As a result, building an effective corporate financial distress prediction model has been an important research topic for a long time. Recently, researchers are exerting themselves to improve the accuracy of financial distress prediction models by applying various business analytics approaches including statistical and artificial intelligence methods. Among them, support vector machines (SVMs) are becoming popular. SVMs require only small training samples and have little possibility of overfitting if model parameters are properly tuned. Nonetheless, SVMs generally show high prediction accuracy since it can deal with complex nonlinear patterns. Despite of these advantages, SVMs are often criticized because their architectural factors are determined by heuristics, such as the parameters of a kernel function and the subsets of appropriate features and instances. In this study, we propose globally optimized SVMs, denoted by GOSVM, a novel hybrid SVM model designed to optimize feature selection, instance selection, and kernel parameters altogether. This study introduces genetic algorithm (GA) in order to simultaneously optimize multiple heterogeneous design factors of SVMs. Our study applies the proposed model to the real-world case for predicting financial distress. Experiments show that the proposed model significantly improves the prediction accuracy of conventional SVMs.


2003 ◽  
Vol 83 (4) ◽  
pp. 789-798 ◽  
Author(s):  
K. Kappler ◽  
R. Edwards ◽  
L. Glass

Sign in / Sign up

Export Citation Format

Share Document