Groundwater Prediction Using Machine-Learning Tools

Eslam A. Hussein; Christopher Thron; Mehrdad Ghaziasgar; Antoine Bagula; Mattia Vaccari

doi:10.3390/a13110300

Groundwater Prediction Using Machine-Learning Tools

Algorithms ◽

10.3390/a13110300 ◽

2020 ◽

Vol 13 (11) ◽

pp. 300

Author(s):

Eslam A. Hussein ◽

Christopher Thron ◽

Mehrdad Ghaziasgar ◽

Antoine Bagula ◽

Mattia Vaccari

Keyword(s):

Machine Learning ◽

Support Vector Regression ◽

Gaussian Mixture ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Tools ◽

Global Features ◽

Groundwater Availability ◽

Extreme Gradient Boosting

Predicting groundwater availability is important to water sustainability and drought mitigation. Machine-learning tools have the potential to improve groundwater prediction, thus enabling resource planners to: (1) anticipate water quality in unsampled areas or depth zones; (2) design targeted monitoring programs; (3) inform groundwater protection strategies; and (4) evaluate the sustainability of groundwater sources of drinking water. This paper proposes a machine-learning approach to groundwater prediction with the following characteristics: (i) the use of a regression-based approach to predict full groundwater images based on sequences of monthly groundwater maps; (ii) strategic automatic feature selection (both local and global features) using extreme gradient boosting; and (iii) the use of a multiplicity of machine-learning techniques (extreme gradient boosting, multivariate linear regression, random forests, multilayer perceptron and support vector regression). Of these techniques, support vector regression consistently performed best in terms of minimizing root mean square error and mean absolute error. Furthermore, including a global feature obtained from a Gaussian Mixture Model produced models with lower error than the best which could be obtained with local geographical features.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text

Application of Artificial Intelligence and Machine Learning Techniques in Classifying Extent of Dementia Across Alzheimer's Image Data

International Journal of Quantitative Structure-Property Relationships ◽

10.4018/ijqspr.2021040103 ◽

2021 ◽

Vol 6 (2) ◽

pp. 29-46

Author(s):

Robin Ghosh ◽

Anirudh Reddy Cingreddy ◽

Venkata Melapu ◽

Sravanthi Joginipelli ◽

Supratik Kar

Keyword(s):

Neural Network ◽

Machine Learning ◽

Nearest Neighbor ◽

Image Data ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

K Nearest Neighbor ◽

Mild Dementia ◽

Extreme Gradient Boosting

Alzheimer's disease (AD) is one of the most common forms of dementia and the sixth-leading cause of death in older adults. The presented study has illustrated the applications of deep learning (DL) and associated methods, which could have a broader impact on identifying dementia stages and may guide therapy in the future for multiclass image detection. The studied datasets contain around 6,400 magnetic resonance imaging (MRI) images, each segregated into the severity of Alzheimer's classes: mild dementia, very mild dementia, non-dementia, moderate dementia. These four image specifications were used to classify the dementia stages in each patient applying the convolutional neural network (CNN) algorithm. Employing the CNN-based in silico model, the authors successfully classified and predicted the different AD stages and got around 97.19% accuracy. Again, machine learning (ML) techniques like extreme gradient boosting (XGB), support vector machine (SVM), k-nearest neighbor (KNN), and artificial neural network (ANN) offered accuracy of 96.62%, 96.56%, 94.62, and 89.88%, respectively.

Download Full-text

A Comparative Study of Machine Learning Techniques for Predicting Sepsis for MIMIC-III Patients

10.21203/rs.3.rs-697902/v1 ◽

2021 ◽

Author(s):

Xuze Zhao ◽

Bo Qu

Keyword(s):

Machine Learning ◽

Intensive Care ◽

Intensive Care Units ◽

Medical Information ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Mortality And Morbidity ◽

Extreme Gradient Boosting ◽

Mimic Iii

Abstract Background: Sepsis is one of the dominating causes of mortality and morbidity in-hospital especially in intensive care units (ICU) patients. Therefore, a reliable decision-making model for predicting sepsis is of great importance. The purpose of this study was to develop an eXtreme Gradient Boosting (XGBoost) based model and explore whether it performs better in predicting sepsis from the time of admission in intensive care units (ICU) than other machine learning (ML) methods. Methods: The source data used for model establishment in this study were from a retrospective medical information mart for intensive care (MIMIC) III dataset, restricted to intensive care units (ICUs) patients aged between 18 and 89. Model performance of the XGBoost model was compared to logistic regression (LR), recursive neural network (RNN), and support vector machine (SVM). Then, the performances of the models were evaluated and compared by the area under the curve (AUC) of the receiver operating characteristic (ROC) curves.Results: A total of 6430 MIMIC-III cases are included in this article, in which, 3021 cases have encountered sepsis while 3409 cases have not, respectively. As for the AUC (0.808 (95% CI): 0.767-0.848,DT), 0.802 (95%CI: 0.762-0.842，RNN), 0.790 (95%CI: 0.751-0.830,SVM), 0.775 (95%CI: 0.736-0.813,LR) , results of the models, XGBoost performs best in predicting sepsis.Conclusions: By using the DT algorithm, a more accurate prediction model can be established. Amongst other ML methods, the XGBoost model demonstrated the best ability in detecting the sepsis of the patients in ICU.

Download Full-text

Bankruptcy Prediction Using Machine Learning Techniques

Journal of Risk and Financial Management ◽

10.3390/jrfm15010035 ◽

2022 ◽

Vol 15 (1) ◽

pp. 35

Author(s):

Shekar Shetty ◽

Mohamed Musa ◽

Xavier Brédart

Keyword(s):

Machine Learning ◽

Small And Medium Enterprises ◽

Bankruptcy Prediction ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Global Accuracy ◽

Medium Enterprises

In this study, we apply several advanced machine learning techniques including extreme gradient boosting (XGBoost), support vector machine (SVM), and a deep neural network to predict bankruptcy using easily obtainable financial data of 3728 Belgian Small and Medium Enterprises (SME’s) during the period 2002–2012. Using the above-mentioned machine learning techniques, we predict bankruptcies with a global accuracy of 82–83% using only three easily obtainable financial ratios: the return on assets, the current ratio, and the solvency ratio. While the prediction accuracy is similar to several previous models in the literature, our model is very simple to implement and represents an accurate and user-friendly tool to discriminate between bankrupt and non-bankrupt firms.

Download Full-text

Machine Learning Methods to Predict Social Media Disaster Rumor Refuters

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph16081452 ◽

2019 ◽

Vol 16 (8) ◽

pp. 1452 ◽

Cited By ~ 4

Author(s):

Shihang Wang ◽

Zongmin Li ◽

Yuhong Wang ◽

Qi Zhang

Keyword(s):

Machine Learning ◽

Language Processing ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Short Text ◽

Learning Techniques ◽

Extreme Gradient Boosting ◽

Effective Decision Support ◽

Short Text Similarity

This research provides a general methodology for distinguishing disaster-related anti-rumor spreaders from a non-ignorant population base, with strong connections in their social circle. Several important influencing factors are examined and illustrated. User information from the most recent posted microblog content of 3793 Sina Weibo users was collected. Natural language processing (NLP) was used for the sentiment and short text similarity analyses, and four machine learning techniques, i.e., logistic regression (LR), support vector machines (SVM), random forest (RF), and extreme gradient boosting (XGBoost) were compared on different rumor refuting microblogs; after which a valid and robust distinguishing XGBoost model was trained and validated to predict who would retweet disaster-related rumor refuting microblogs. Compared with traditional prediction variables that only access user information, the similarity and sentiment analyses of the most recent user microblog contents were found to significantly improve prediction precision and robustness. The number of user microblogs also proved to be a valuable reference for all samples during the prediction process. This prediction methodology could be possibly more useful for WeChat or Facebook as these have relatively stable closed-loop communication channels, which means that rumors are more likely to be refuted by acquaintances. Therefore, the methodology is going to be further optimized and validated on WeChat-like channels in the future. The novel rumor refuting approach presented in this research harnessed NLP for the user microblog content analysis and then used the analysis results of NLP as additional prediction variables to identify the anti-rumor spreaders. Therefore, compared to previous studies, this study presents a new and effective decision support for rumor countermeasures.

Download Full-text

Aerosol optical depth prediction using machine learning techniques

10.5194/ems2021-22 ◽

2021 ◽

Author(s):

Stavros Andreas Logothetis ◽

Vasileios Salamalikis ◽

Andreas Kazantzidis

Keyword(s):

Machine Learning ◽

Aerosol Optical Depth ◽

Optical Depth ◽

Multivariate Adaptive Regression Splines ◽

Surface Radiation ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Climate Projections ◽

Extreme Gradient Boosting

<p>Aerosol optical depth (AOD) describes adequately aerosols&#8217;s burden and extinction within an atmospheric column. AOD can be retrieved using remote sensing instruments such as ground-based sun photometers. Despite the very good quality of ground based AOD measurements, their spatiotemporal coverage is restricted. In this study, an alternative approach of AOD estimation is proposed with the synergy of ground-based measurements and machine learning (ML) techniques, in order to expand and complement the existing spatiotemporal capabilities of AOD data. The ML algorithms which are implemented are: Random Forests, Gradient Boosting Machines, Extreme Gradient Boosting Machines, Support Vector Regression, K-nearest Neighbors Regression, and Multivariate Adaptive Regression Splines. Each model receives as input information the Global Horizontal Irradiance (GHI) as well as water vapor (WV) content in hourly basis and under clear skies. A randomized cross-validation search scheme is implemented to obtain the optimal hyperparameters and avoid overfitting for each ML algorithm. GHI and WV are retrieved from Baseline Surface Radiation Network (BSRN) and NASA&#8217;s Modern-Era Retrospective Analysis for Research and Applications-2 (MERRA-2) reanalysis product respectively. AOD estimations are evaluated against AOD from AErosol RObotic NETwork (AERONET) inversion product, using the Level 2.0 Version 3 (L2V3) which provides cloud-screened and quality assured measurements. In total, 29 collocated AERONET-BSRN stations are used spanning from 2000 to 2019. Since, the aerosol pattern is different at each site, the effect of various aerosol types is further investigated. ML-based AOD predictions are adequately good, highlighting the feasibility of ML algorithms on producing AOD data. The results of this study could be useful for direct normal irradiance estimations as well as aerosols radiative effect calculations and climate projections.</p>

Download Full-text

Explainable machine learning can outperform Cox regression predictions and provide insights in breast cancer survival

Scientific Reports ◽

10.1038/s41598-021-86327-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Arturo Moncada-Torres ◽

Marissa C. van Maaren ◽

Mathijs P. Hendriks ◽

Sabine Siesling ◽

Gijs Geleijnse

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Explicit Knowledge ◽

Cox Regression ◽

Metastatic Breast ◽

Gradient Boosting ◽

Support Vector ◽

Netherlands Cancer Registry ◽

Extreme Gradient Boosting ◽

The Impact

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.

Download Full-text

Predictive models for stage and risk classification in head and neck squamous cell carcinoma (HNSCC)

PeerJ ◽

10.7717/peerj.9656 ◽

2020 ◽

Vol 8 ◽

pp. e9656

Author(s):

Sugandh Kumar ◽

Srinivas Patnaik ◽

Anshuman Dixit

Keyword(s):

Machine Learning ◽

Expression Profiles ◽

Disease Process ◽

Penalized Regression ◽

Functional Enrichment ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Sequencing Data ◽

Therapeutic Modalities

Machine learning techniques are increasingly used in the analysis of high throughput genome sequencing data to better understand the disease process and design of therapeutic modalities. In the current study, we have applied state of the art machine learning (ML) algorithms (Random Forest (RF), Support Vector Machine Radial Kernel (svmR), Adaptive Boost (AdaBoost), averaged Neural Network (avNNet), and Gradient Boosting Machine (GBM)) to stratify the HNSCC patients in early and late clinical stages (TNM) and to predict the risk using miRNAs expression profiles. A six miRNA signature was identified that can stratify patients in the early and late stages. The mean accuracy, sensitivity, specificity, and area under the curve (AUC) was found to be 0.84, 0.87, 0.78, and 0.82, respectively indicating the robust performance of the generated model. The prognostic signature of eight miRNAs was identified using LASSO (least absolute shrinkage and selection operator) penalized regression. These miRNAs were found to be significantly associated with overall survival of the patients. The pathway and functional enrichment analysis of the identified biomarkers revealed their involvement in important cancer pathways such as GP6 signalling, Wnt signalling, p53 signalling, granulocyte adhesion, and dipedesis. To the best of our knowledge, this is the first such study and we hope that these signature miRNAs will be useful for the risk stratification of patients and the design of therapeutic modalities.

Download Full-text

Exploring the Mechanism of Crashes with Autonomous Vehicles Using Machine Learning

Mathematical Problems in Engineering ◽

10.1155/2021/5524356 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Hengrui Chen ◽

Hong Chen ◽

Ruiyu Zhou ◽

Zhizhen Liu ◽

Xiaoke Sun

Keyword(s):

Machine Learning ◽

Autonomous Vehicles ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Crash Severity ◽

Apriori Algorithm ◽

Driving Mode ◽

Extreme Gradient Boosting ◽

The Impact

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text