scholarly journals Automated Landslide-Risk Prediction Using Web GIS and Machine Learning Models

Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4620
Author(s):  
Naruephorn Tengtrairat ◽  
Wai Lok Woo ◽  
Phetcharat Parathai ◽  
Chuchoke Aryupong ◽  
Peerapong Jitsangiam ◽  
...  

Spatial susceptible landslide prediction is the one of the most challenging research areas which essentially concerns the safety of inhabitants. The novel geographic information web (GIW) application is proposed for dynamically predicting landslide risk in Chiang Rai, Thailand. The automated GIW system is coordinated between machine learning technologies, web technologies, and application programming interfaces (APIs). The new bidirectional long short-term memory (Bi-LSTM) algorithm is presented to forecast landslides. The proposed algorithm consists of 3 major steps, the first of which is the construction of a landslide dataset by using Quantum GIS (QGIS). The second step is to generate the landslide-risk model based on machine learning approaches. Finally, the automated landslide-risk visualization illustrates the likelihood of landslide via Google Maps on the website. Four static factors are considered for landslide-risk prediction, namely, land cover, soil properties, elevation and slope, and a single dynamic factor i.e., precipitation. Data are collected to construct a geospatial landslide database which comprises three historical landslide locations—Phu Chifa at Thoeng District, Ban Pha Duea at Mae Salong Nai, and Mai Salong Nok in Mae Fa Luang District, Chiang Rai, Thailand. Data collection is achieved using QGIS software to interpolate contour, elevation, slope degree and land cover from the Google satellite images, aerial and site survey photographs while the physiographic and rock type are on-site surveyed by experts. The state-of-the-art machine learning models have been trained i.e., linear regression (LR), artificial neural network (ANN), LSTM, and Bi-LSTM. Ablation studies have been conducted to determine the optimal parameters setting for each model. An enhancement method based on two-stage classifications has been presented to improve the landslide prediction of LSTM and Bi-LSTM models. The landslide-risk prediction performances of these models are subsequently evaluated using real-time dataset and it is shown that Bi-LSTM with Random Forest (Bi-LSTM-RF) yields the best prediction performance. Bi-LSTM-RF model has improved the landslide-risk predicting performance over LR, ANNs, LSTM, and Bi-LSTM in terms of the area under the receiver characteristic operator (AUC) scores by 0.42, 0.27, 0.46, and 0.47, respectively. Finally, an automated web GIS has been developed and it consists of software components including the trained models, rainfall API, Google API, and geodatabase. All components have been interfaced together via JavaScript and Node.js tool.

Author(s):  
Nghia H Nguyen ◽  
Dominic Picetti ◽  
Parambir S Dulai ◽  
Vipul Jairath ◽  
William J Sandborn ◽  
...  

Abstract Background and Aims There is increasing interest in machine learning-based prediction models in inflammatory bowel diseases (IBD). We synthesized and critically appraised studies comparing machine learning vs. traditional statistical models, using routinely available clinical data for risk prediction in IBD. Methods Through a systematic review till January 1, 2021, we identified cohort studies that derived and/or validated machine learning models, based on routinely collected clinical data in patients with IBD, to predict the risk of harboring or developing adverse clinical outcomes, and reported its predictive performance against a traditional statistical model for the same outcome. We appraised the risk of bias in these studies using the Prediction model Risk of Bias ASsessment (PROBAST) tool. Results We included 13 studies on machine learning-based prediction models in IBD encompassing themes of predicting treatment response to biologics and thiopurines, predicting longitudinal disease activity and complications and outcomes in patients with acute severe ulcerative colitis. The most common machine learnings models used were tree-based algorithms, which are classification approaches achieved through supervised learning. Machine learning models outperformed traditional statistical models in risk prediction. However, most models were at high risk of bias, and only one was externally validated. Conclusions Machine learning-based prediction models based on routinely collected data generally perform better than traditional statistical models in risk prediction in IBD, though frequently have high risk of bias. Future studies examining these approaches are warranted, with special focus on external validation and clinical applicability.


Author(s):  
Chenxi Huang ◽  
Shu-Xia Li ◽  
César Caraballo ◽  
Frederick A. Masoudi ◽  
John S. Rumsfeld ◽  
...  

Background: New methods such as machine learning techniques have been increasingly used to enhance the performance of risk predictions for clinical decision-making. However, commonly reported performance metrics may not be sufficient to capture the advantages of these newly proposed models for their adoption by health care professionals to improve care. Machine learning models often improve risk estimation for certain subpopulations that may be missed by these metrics. Methods and Results: This article addresses the limitations of commonly reported metrics for performance comparison and proposes additional metrics. Our discussions cover metrics related to overall performance, discrimination, calibration, resolution, reclassification, and model implementation. Models for predicting acute kidney injury after percutaneous coronary intervention are used to illustrate the use of these metrics. Conclusions: We demonstrate that commonly reported metrics may not have sufficient sensitivity to identify improvement of machine learning models and propose the use of a comprehensive list of performance metrics for reporting and comparing clinical risk prediction models.


2020 ◽  
Vol 34 (08) ◽  
pp. 13396-13401
Author(s):  
Wei Wang ◽  
Christopher Lesner ◽  
Alexander Ran ◽  
Marko Rukonic ◽  
Jason Xue ◽  
...  

Machine learning applied to financial transaction records can predict how likely a small business is to repay a loan. For this purpose we compared a traditional scorecard credit risk model against various machine learning models and found that XGBoost with monotonic constraints outperformed scorecard model by 7% in K-S statistic. To deploy such a machine learning model in production for loan application risk scoring it must comply with lending industry regulations that require lenders to provide understandable and specific reasons for credit decisions. Thus we also developed a loan decision explanation technique based on the ideas of WoE and SHAP. Our research was carried out using a historical dataset of tens of thousands of loans and millions of associated financial transactions. The credit risk scoring model based on XGBoost with monotonic constraints and SHAP explanations described in this paper have been deployed by QuickBooks Capital to assess incoming loan applications since July 2019.


2019 ◽  
Vol 40 (Supplement_1) ◽  
Author(s):  
I Korsakov ◽  
A Gusev ◽  
T Kuznetsova ◽  
D Gavrilov ◽  
R Novitskiy

Abstract Abstract Background Advances in precision medicine will require an increasingly individualized prognostic evaluation of patients in order to provide the patient with appropriate therapy. The traditional statistical methods of predictive modeling, such as SCORE, PROCAM, and Framingham, according to the European guidelines for the prevention of cardiovascular disease, not adapted for all patients and require significant human involvement in the selection of predictive variables, transformation and imputation of variables. In ROC-analysis for prediction of significant cardiovascular disease (CVD), the areas under the curve for Framingham: 0.62–0.72, for SCORE: 0.66–0.73 and for PROCAM: 0.60–0.69. To improve it, we apply for approaches to predict a CVD event rely on conventional risk factors by machine learning and deep learning models to 10-year CVD event prediction by using longitudinal electronic health record (EHR). Methods For machine learning, we applied logistic regression (LR) and recurrent neural networks with long short-term memory (LSTM) units as a deep learning algorithm. We extract from longitudinal EHR the following features: demographic, vital signs, diagnoses (ICD-10-cm: I21-I22.9: I61-I63.9) and medication. The problem in this step, that near 80 percent of clinical information in EHR is “unstructured” and contains errors and typos. Missing data are important for the correct training process using by deep learning & machine learning algorithm. The study cohort included patients between the ages of 21 to 75 with a dynamic observation window. In total, we got 31517 individuals in the dataset, but only 3652 individuals have all features or missing features values can be easy to impute. Among these 3652 individuals, 29.4% has a CVD, mean age 49.4 years, 68,2% female. Evaluation We randomly divided the dataset into a training and a test set with an 80/20 split. The LR was implemented with Python Scikit-Learn and the LSTM model was implemented with Keras using Tensorflow as the backend. Results We applied machine learning and deep learning models using the same features as traditional risk scale and longitudinal EHR features for CVD prediction, respectively. Machine learning model (LR) achieved an AUROC of 0.74–0.76 and deep learning (LSTM) 0.75–0.76. By using features from EHR logistic regression and deep learning models improved the AUROC to 0.78–0.79. Conclusion The machine learning models outperformed a traditional clinically-used predictive model for CVD risk prediction (i.e. SCORE, PROCAM, and Framingham equations). This approach was used to create a clinical decision support system (CDSS). It uses both traditional risk scales and models based on neural networks. Especially important is the fact that the system can calculate the risks of cardiovascular disease automatically and recalculate immediately after adding new information to the EHR. The results are delivered to the user's personal account.


2020 ◽  
Author(s):  
Guangyao Wu ◽  
Pei Yang ◽  
Henry C. Woodruff ◽  
Xiangang Rao ◽  
Julien Guiot ◽  
...  

Key pointsQuestionHow do nomograms and machine-learning algorithms of severity risk prediction and triage of COVID-19 patients at hospital admission perform?FindingsThis model was prospectively validated on six test datasets comprising of 426 patients and yielded AUCs ranging from 0.816 to 0.976, accuracies ranging from 70.8% to 93.8%, sensitivities ranging from 83.7% to 100%, and specificities ranging from 41.0% to 95.7%. The cut-off probability values for low, medium, and high-risk groups were 0.072 and 0.244.MeaningThe findings of this study suggest that our models performs well for the diagnosis and prediction of progression to severe or critical illness of COVID-19 patients and could be used for triage of COVID-19 patients at hospital admission.IMPORTANCEThe outbreak of the coronavirus disease 2019 (COVID-19) has globally strained medical resources and caused significant mortality for severely and critically ill patients. However, the availability of validated nomograms and the machine-learning model to predict severity risk and triage of affected patients is limited.OBJECTIVETo develop and validate nomograms and machine-learning models for severity risk assessment and triage for COVID-19 patients at hospital admission.DESIGN, SETTING, AND PARTICIPANTSA retrospective cohort of 299 consecutively hospitalized COVID-19 patients at The Central Hospital of Wuhan, China, from December 23, 2019, to February 13, 2020, was used to train and validate the models. Six cohorts with 426 patients from eight centers in China, Italy, and Belgium, from February 20, 2020, to March 21, 2020, were used to prospectively validate the models.MAIN OUTCOME AND MEASURESThe main outcome was the onset of severe or critical illness during hospitalization. Model performances were quantified using the area under the receiver operating characteristic curve (AUC), accuracy, sensitivity, and specificity.RESULTSOf the 299 hospitalized COVID-19 patients in the retrospective cohort, the median age was 50 years ((interquartile range, 35.5-63.0; range, 20–94 years) and 137 (45.8%) were men. Of the 426 hospitalized COVID-19 patients in the prospective cohorts, the median age was 62.0 years ((interquartile range, 50.0-72.0; range, 19-94 years) and 236 (55.4%) were men. The model was prospectively validated on six cohorts yielding AUCs ranging from 0.816 to 0.976, with accuracies ranging from 70.8% to 93.8%, sensitivities ranging from 83.7% to 100%, and specificities ranging from 41.0% to 95.7%. The cut-off values of the low, medium, and high-risk probabilities were 0.072 and 0.244. The developed online calculators can be found at https://covid19risk.ai/.CONCLUSION AND RELEVANCEThe machine learning models, nomograms, and online calculators might be useful for the prediction of onset of severe and critical illness among COVID-19 patients and triage at hospital admission. Further prospective research and clinical feedback are necessary to evaluate the clinical usefulness of this model and to determine whether these models can help optimize medical resources and reduce mortality rates compared with current clinical practices.


2019 ◽  
Author(s):  
Ji Hwan Park ◽  
Han Eol Cho ◽  
Jong Hun Kim ◽  
Melanie Wall ◽  
Yaakov Stern ◽  
...  

AbstractNationwide population-based cohort provides a new opportunity to build a completely automated risk prediction model based on individuals’ history of health and healthcare beyond existing risk prediction models. We tested the possibility of machine learning models to predict future incidence of Alzheimer’s disease (AD) using large-scale administrative health data. From the Korean National Health Insurance Service database between 2002 and 2010, we obtained de-identified health data in elders above 65 years (N=40,736) containing 4,894 unique clinical features including ICD-10 codes, medication codes, laboratory values, history of personal and family illness, and socio-demographics. To define incident AD two operational definitions were considered: “definite AD” with diagnostic codes and dementia medication (n=614) and “probable AD” with only diagnosis (n=2,026). We trained and validated a random forest, support vector machine, and logistic regression to predict incident AD in 1,2,3, and 4 subsequent years. For predicting future incidence of AD in balanced samples (bootstrapping), the machine learning models showed reasonable performance in 1-year prediction with AUC of 0.775 and 0.759, based on “definite AD” and “probable AD” outcomes, respectively; in 2-year, 0.730 and 0.693; in 3-year, 0.677 and 0.644; in 4-year, 0.725 and 0.683. The results were similar when the entire (unbalanced) samples were used. Important clinical features selected in logistic regression included hemoglobin level, age, and urine protein level. This study may shed a light on the utility of the data-driven machine learning model based on large-scale administrative health data in AD risk prediction, which may enable better selection of individuals at risk for AD in clinical trials or early detection in clinical settings.


Sign in / Sign up

Export Citation Format

Share Document