scholarly journals An Update on Statistical Boosting in Biomedicine

2017 ◽  
Vol 2017 ◽  
pp. 1-12 ◽  
Author(s):  
Andreas Mayr ◽  
Benjamin Hofner ◽  
Elisabeth Waldmann ◽  
Tobias Hepp ◽  
Sebastian Meyer ◽  
...  

Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression, and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine.

2015 ◽  
Vol 34 (21) ◽  
pp. 2941-2957 ◽  
Author(s):  
Julian Wolfson ◽  
Sunayan Bandyopadhyay ◽  
Mohamed Elidrisi ◽  
Gabriela Vazquez-Benitez ◽  
David M. Vock ◽  
...  

2020 ◽  
Vol 25 (4) ◽  
pp. 433-448 ◽  
Author(s):  
Alex Ingrams

In this paper, the author argues that the conflict between the copious amount of digital data processed by public organisations and the need for policy-relevant insights to aid public participation constitutes a ‘public information paradox’. Machine learning (ML) approaches may offer one solution to this paradox through algorithms that transparently collect and use statistical modelling to provide insights for policymakers. Such an approach is tested in this paper. The test involves applying an unsupervised machine learning approach with latent Dirichlet allocation (LDA) analysis of thousands of public comments submitted to the United States Transport Security Administration (TSA) on a 2013 proposed regulation for the use of new full body imaging scanners in airport security terminals. The analysis results in salient topic clusters that could be used by policymakers to understand large amounts of text such as in an open public comments process. The results are compared with the actual final proposed TSA rule, and the author reflects on new questions raised for transparency by the implementation of ML in open rule-making processes.


2014 ◽  
Vol 53 (06) ◽  
pp. 428-435 ◽  
Author(s):  
H. Binder ◽  
O. Gefeller ◽  
M. Schmid ◽  
A. Mayr

SummaryBackground: Boosting algorithms to simultaneously estimate and select predictor effects in statistical models have gained substantial interest during the last decade.Objectives: This review highlights recent methodological developments regarding boosting algorithms for statistical modelling especially focusing on topics relevant for biomedical research.Methods: We suggest a unified framework for gradient boosting and likelihood-based boosting (statistical boosting) which have been addressed separately in the literature up to now.Results: The methodological developments on statistical boosting during the last ten years can be grouped into three different lines of research: i) efforts to ensure variable selection leading to sparser models, ii) developments regarding different types of predictor effects and how to choose them, iii) approaches to extend the statistical boosting framework to new regression settings.Conclusions: Statistical boosting algorithms have been adapted to carry out unbiased variable selection and automated model choice during the fitting process and can nowadays be applied in almost any regression setting in combination with a large amount of different types of predictor effects.


2020 ◽  
Author(s):  
Mohammad Asghari Jafarabadi ◽  
Zeynab Iraji ◽  
Roya Dolatkhah ◽  
Tohid Jafari Koshki

Abstract Background: Breast cancer (BC) was the fifth leading cause of death worldwide in 2015 and the second leading cause of death in Iran in 2012. This study aimed to model the factors associated with mortality in patients with BC utilizing the machine learning approach.Methods: We used data of patients with primary BC during 2007-2016 in Tabriz, Iran. The data were analyzed using decision tree (DT), boosted tree (BT), random forest (RF), k-nearest neighbors (KNN) and generalized additive model (GAM) with inverse probability of censoring weighting (IPCW) technique to assess the risk factors of mortality. The models were compared by using diagnostic accuracy measures.Results: Accuracy of the models ranged from 76.0 to 93.0%, with sensitivity of 82.5-98.8% and specificity of 72.2-99.4%. The GAM fit the data best with accuracy of 93.0% (95% CI: [90.5, 95.0]), sensitivity of 98.8% (95% CI: [96.9, 99.7]) and specificity of 84.3% (95% CI: [78.8, 88.9]) where non-linear effect of age (p-value = 0.006), grade (p-value = 0.024) and time to event (p-value < 0.001) on mortality were significant. Conclusion: The GAM seems to be an optimal model for classifying the mortality in patients with BC. Considering the time to event, age and grade, as the prognostic factors obtained by GAM, more accurate prevention planning may be designed.


2020 ◽  
Author(s):  
Mohammad Asghari Jafarabadi ◽  
Zaynab Iraji ◽  
Roya Dolatkhah ◽  
Tohid jafari koshki

Abstract Background: Breast cancer (BC) was the fifth leading cause of death worldwide in 2015 and the second leading cause of death in Iran in 2012. This study aimed to model the factors associated with mortality in patients with BC utilizing the machine learning approach.Methods: We used data of patients with primary BC during 2007-2016 in Tabriz, Iran. The data were analyzed using decision tree (DT), boosted tree (BT), random forest (RF), k-nearest neighbors (KNN) and generalized additive model (GAM) with inverse probability of censoring weighting (IPCW) technique to assess the risk factors of mortality. The models were compared by using diagnostic accuracy measures.Results: Accuracy of the models ranged from 76.0 to 93.0%, with sensitivity of 82.5-98.8% and specificity of 72.2-99.4%. The GAM fit the data best with accuracy of 93.0% (95% CI: [90.5, 95.0]), sensitivity of 98.8% (95% CI: [96.9, 99.7]) and specificity of 84.3% (95% CI: [78.8, 88.9]) where non-linear effect of age (p-value = 0.006), grade (p-value = 0.024) and time to event (p-value < 0.001) on mortality were significant. Conclusion: The GAM seems to be an optimal model for classifying the mortality in patients with BC. Considering the time to event, age and grade, as the prognostic factors obtained by GAM, more accurate prevention planning may be designed.


2020 ◽  
Vol 11 ◽  
Author(s):  
Julianne Duhazé ◽  
Signe Hässler ◽  
Delphine Bachelet ◽  
Aude Gleizes ◽  
Salima Hacein-Bey-Abina ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document