Gully Head-Cut Distribution Modeling Using Machine Learning Methods—A Case Study of N.W. Iran

Alireza Arabameri; Wei Chen; Thomas Blaschke; John P. Tiefenbacher; Biswajeet Pradhan; Dieu Tien Bui

doi:10.3390/w12010016

Gully Head-Cut Distribution Modeling Using Machine Learning Methods—A Case Study of N.W. Iran

Water ◽

10.3390/w12010016 ◽

2019 ◽

Vol 12 (1) ◽

pp. 16 ◽

Cited By ~ 14

Author(s):

Alireza Arabameri ◽

Wei Chen ◽

Thomas Blaschke ◽

John P. Tiefenbacher ◽

Biswajeet Pradhan ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Goodness Of Fit ◽

Gully Erosion ◽

Ensemble Model ◽

Conditioning Factors ◽

Alternating Decision Tree ◽

Gully Head ◽

Semi Arid

To more effectively prevent and manage the scourge of gully erosion in arid and semi-arid regions, we present a novel-ensemble intelligence approach—bagging-based alternating decision-tree classifier (bagging-ADTree)—and use it to model a landscape’s susceptibility to gully erosion based on 18 gully-erosion conditioning factors. The model’s goodness-of-fit and prediction performance are compared to three other machine learning algorithms (single alternating decision tree, rotational-forest-based alternating decision tree (RF-ADTree), and benchmark logistic regression). To achieve this, a gully-erosion inventory was created for the study area, the Chah Mousi watershed, Iran by combining archival records containing reports of gully erosion, remotely sensed data from Google Earth, and geolocated sites of gully head-cuts gathered in a field survey. A total of 119 gully head-cuts were identified and mapped. To train the models’ analysis and prediction capabilities, 83 head-cuts (70% of the total) and the corresponding measures of the conditioning factors were input into each model. The results from the models were validated using the data pertaining to the remaining 36 gully locations (30%). Next, the frequency ratio is used to identify which conditioning-factor classes have the strongest correlation with gully erosion. Using random-forest modeling, the relative importance of each of the conditioning factors was determined. Based on the random-forest results, the top eight factors in this study area are distance-to-road, drainage density, distance-to-stream, LU/LC, annual precipitation, topographic wetness index, NDVI, and elevation. Finally, based on goodness-of-fit and AUROC of the success rate curve (SRC) and prediction rate curve (PRC), the results indicate that the bagging-ADTree ensemble model had the best performance, with SRC (0.964) and PRC (0.978). RF-ADTree (SRC = 0.952 and PRC = 0.971), ADTree (SRC = 0.926 and PRC = 0.965), and LR (SRC = 0.867 and PRC = 0.870) were the subsequent best performers. The results also indicate that bagging and RF, as meta-classifiers, improved the performance of the ADTree model as a base classifier. The bagging-ADTree model’s results indicate that 24.28% of the study area is classified as having high and very high susceptibility to gully erosion. The new ensemble model accurately identified the areas that are susceptible to gully erosion based on the past patterns of formation, but it also provides highly accurate predictions of future gully development. The novel ensemble method introduced in this research is recommended for use to evaluate the patterns of gullying in arid and semi-arid environments and can effectively identify the most salient conditioning factors that promote the development and expansion of gullies in erosion-susceptible environments.

Download Full-text

Machine Learning in Aging: An Example of Developing Prediction Models for Serious Fall Injury in Older Adults

Innovation in Aging ◽

10.1093/geroni/igaa057.859 ◽

2020 ◽

Vol 4 (Supplement_1) ◽

pp. 268-269

Author(s):

Jaime Speiser ◽

Kathryn Callahan ◽

Jason Fanning ◽

Thomas Gill ◽

Anne Newman ◽

...

Keyword(s):

Machine Learning ◽

Older Adults ◽

Random Forest ◽

Decision Tree ◽

Prediction Models ◽

Receiver Operating Curve ◽

Learning Methods ◽

Life Study ◽

Fall Injury ◽

Machine Learning Methods

Abstract Advances in computational algorithms and the availability of large datasets with clinically relevant characteristics provide an opportunity to develop machine learning prediction models to aid in diagnosis, prognosis, and treatment of older adults. Some studies have employed machine learning methods for prediction modeling, but skepticism of these methods remains due to lack of reproducibility and difficulty understanding the complex algorithms behind models. We aim to provide an overview of two common machine learning methods: decision tree and random forest. We focus on these methods because they provide a high degree of interpretability. We discuss the underlying algorithms of decision tree and random forest methods and present a tutorial for developing prediction models for serious fall injury using data from the Lifestyle Interventions and Independence for Elders (LIFE) study. Decision tree is a machine learning method that produces a model resembling a flow chart. Random forest consists of a collection of many decision trees whose results are aggregated. In the tutorial example, we discuss evaluation metrics and interpretation for these models. Illustrated in data from the LIFE study, prediction models for serious fall injury were moderate at best (area under the receiver operating curve of 0.54 for decision tree and 0.66 for random forest). Machine learning methods may offer improved performance compared to traditional models for modeling outcomes in aging, but their use should be justified and output should be carefully described. Models should be assessed by clinical experts to ensure compatibility with clinical practice.

Download Full-text

Modified Decision Tree Technique for Ransomware Detection at Runtime through API Calls

Scientific Programming ◽

10.1155/2020/8845833 ◽

2020 ◽

Vol 2020 ◽

pp. 1-10

Author(s):

Faizan Ullah ◽

Qaisar Javaid ◽

Abdu Salam ◽

Masood Ahmad ◽

Nadeem Sarwar ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Feature Vector ◽

Machine Learning Algorithms ◽

The Novel ◽

Proposed Model ◽

Testing Accuracy ◽

Financial Losses

Ransomware (RW) is a distinctive variety of malware that encrypts the files or locks the user’s system by keeping and taking their files hostage, which leads to huge financial losses to users. In this article, we propose a new model that extracts the novel features from the RW dataset and performs classification of the RW and benign files. The proposed model can detect a large number of RW from various families at runtime and scan the network, registry activities, and file system throughout the execution. API-call series was reutilized to represent the behavior-based features of RW. The technique extracts fourteen-feature vector at runtime and analyzes it by applying online machine learning algorithms to predict the RW. To validate the effectiveness and scalability, we test 78550 recent malign and benign RW and compare with the random forest and AdaBoost, and the testing accuracy is extended at 99.56%.

Download Full-text

FLOOD MAPPING USING RANDOM FOREST AND IDENTIFYING THE ESSENTIAL CONDITIONING FACTORS; A CASE STUDY IN FREDERICTON, NEW BRUNSWICK, CANADA

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-3-2020-609-2020 ◽

2020 ◽

Vol V-3-2020 ◽

pp. 609-615 ◽

Cited By ~ 1

Author(s):

M. Esfandiari ◽

S. Jabari ◽

H. McGrath ◽

D. Coleman

Keyword(s):

Machine Learning ◽

Random Forest ◽

New Brunswick ◽

Urban Areas ◽

Learning Algorithm ◽

Satellite Image ◽

Machine Learning Algorithms ◽

Slope Aspect ◽

Flood Peak ◽

Conditioning Factors

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.

Download Full-text

Subepithelial neutrophil infiltration as a predictor of the surgical outcome of chronic rhinosinusitis with nasal polyps

Rhinology Journal ◽

10.4193/rhin20.373 ◽

2020 ◽

Vol 0 (0) ◽

pp. 0-0

Author(s):

D-K. Kim ◽

H-S. Lim ◽

K.M. Eun ◽

Y. Seo ◽

J.K. Kim ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Chronic Rhinosinusitis ◽

Surgical Outcomes ◽

Nasal Polyps ◽

Human Neutrophil Elastase ◽

Immunofluorescence Analysis ◽

Double Positive ◽

Ki 67

BACKGROUND: Neutrophils present as major inflammatory cells in refractory chronic rhinosinusitis with nasal polyps (CRSwNP), regardless of the endotype. However, their role in the pathophysiology of CRSwNP remains poorly understood. We investigated factors predicting the surgical outcomes of CRSwNP patients with focus on neutrophilic localization. METHODS: We employed machine-learning methods such as the decision tree and random forest models to predict the surgical outcomes of CRSwNP. Immunofluorescence analysis was conducted to detect human neutrophil elastase (HNE), Bcl-2, and Ki-67 in NP tissues. We counted the immunofluorescence-positive cells and divided them into three groups based on the infiltrated area, namely, epithelial, subepithelial, and perivascular groups. RESULTS: On machine learning, the decision tree algorithm demonstrated that the number of subepithelial HNE-positive cells, Lund-Mackay (LM) scores, and endotype (eosinophilic or non-eosinophilic) were the most important predictors of surgical outcomes in CRSwNP patients. Additionally, the random forest algorithm showed that, after ranking the mean decrease in the Gini index or the accuracy of each factor, the top three ranking factors associated with surgical outcomes were the LM score, age, and number of subepithelial HNE-positive cells. In terms of cellular proliferation, immunofluorescence analysis revealed that Ki-67/HNE-double positive and Bcl-2/HNE-double positive cells were significantly increased in the subepithelial area in refractory CRSwNP. CONCLUSION: Our machine-learning approach and immunofluorescence analysis demonstrated that subepithelial neutrophils in NP tissues had a high expression of Ki-67 and could serve as a cellular biomarker for predicting surgical outcomes in CRSwNP patients.

Download Full-text

Spatial Modeling of Snow Avalanche Using Machine Learning Models and Geo-Environmental Factors: Comparison of Effectiveness in Two Mountain Regions

Remote Sensing ◽

10.3390/rs11242995 ◽

2019 ◽

Vol 11 (24) ◽

pp. 2995 ◽

Cited By ~ 10

Author(s):

Omid Rahmati ◽

Omid Ghorbanzadeh ◽

Teimur Teimurian ◽

Farnoush Mohammadi ◽

John P. Tiefenbacher ◽

...

Keyword(s):

Machine Learning ◽

Goodness Of Fit ◽

Snow Avalanche ◽

Slope Position ◽

Support Vector ◽

Hazard Mapping ◽

Ensemble Model ◽

Mountainous Regions ◽

Avalanche Hazard ◽

Statistical Measures

Although snow avalanches are among the most destructive natural disasters, and result in losses of life and economic damages in mountainous regions, far too little attention has been paid to the prediction of the snow avalanche hazard using advanced machine learning (ML) models. In this study, the applicability and efficiency of four ML models: support vector machine (SVM), random forest (RF), naïve Bayes (NB) and generalized additive model (GAM), for snow avalanche hazard mapping, were evaluated. Fourteen geomorphometric, topographic and hydrologic factors were selected as predictor variables in the modeling. This study was conducted in the Darvan and Zarrinehroud watersheds of Iran. The goodness-of-fit and predictive performance of the models was evaluated using two statistical measures: the area under the receiver operating characteristic curve (AUROC) and the true skill statistic (TSS). Finally, an ensemble model was developed based upon the results of the individual models. Results show that, among individual models, RF was best, performing well in both the Darvan (AUROC = 0.964, TSS = 0.862) and Zarrinehroud (AUROC = 0.956, TSS = 0.881) watersheds. The accuracy of the ensemble model was slightly better than all individual models for generating the snow avalanche hazard map, as validation analyses showed an AUROC = 0.966 and a TSS = 0.865 in the Darvan watershed, and an AUROC value of 0.958 and a TSS value of 0.877 for the Zarrinehroud watershed. The results indicate that slope length, lithology and relative slope position (RSP) are the most important factors controlling snow avalanche distribution. The methodology developed in this study can improve risk-based decision making, increases the credibility and reliability of snow avalanche hazard predictions and can provide critical information for hazard managers.

Download Full-text

Predicting Bank Operational Efficiency Using Machine Learning Algorithm: Comparative Study of Decision Tree, Random Forest, and Neural Networks

Advances in Fuzzy Systems ◽

10.1155/2020/8581202 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Peter Appiahene ◽

Yaw Marfo Missah ◽

Ussiph Najim

Keyword(s):

Machine Learning ◽

Random Forest ◽

Decision Tree ◽

Banking Sector ◽

Banking Industry ◽

Predictive Accuracy ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

And Performance

The financial crisis that hit Ghana from 2015 to 2018 has raised various issues with respect to the efficiency of banks and the safety of depositors’ in the banking industry. As part of measures to improve the banking sector and also restore customers’ confidence, efficiency and performance analysis in the banking industry has become a hot issue. This is because stakeholders have to detect the underlying causes of inefficiencies within the banking industry. Nonparametric methods such as Data Envelopment Analysis (DEA) have been suggested in the literature as a good measure of banks’ efficiency and performance. Machine learning algorithms have also been viewed as a good tool to estimate various nonparametric and nonlinear problems. This paper presents a combined DEA with three machine learning approaches in evaluating bank efficiency and performance using 444 Ghanaian bank branches, Decision Making Units (DMUs). The results were compared with the corresponding efficiency ratings obtained from the DEA. Finally, the prediction accuracies of the three machine learning algorithm models were compared. The results suggested that the decision tree (DT) and its C5.0 algorithm provided the best predictive model. It had 100% accuracy in predicting the 134 holdout sample dataset (30% banks) and a P value of 0.00. The DT was followed closely by random forest algorithm with a predictive accuracy of 98.5% and a P value of 0.00 and finally the neural network (86.6% accuracy) with a P value 0.66. The study concluded that banks in Ghana can use the result of this study to predict their respective efficiencies. All experiments were performed within a simulation environment and conducted in R studio using R codes.

Download Full-text

Comparison of Support Vector Machine, Bayesian Logistic Regression, and Alternating Decision Tree Algorithms for Shallow Landslide Susceptibility Mapping along a Mountainous Road in the West of Iran

Applied Sciences ◽

10.3390/app10155047 ◽

2020 ◽

Vol 10 (15) ◽

pp. 5047 ◽

Cited By ~ 7

Author(s):

Viet-Ha Nhu ◽

Danesh Zandi ◽

Himan Shahabi ◽

Kamran Chapi ◽

Ataollah Shirzadi ◽

...

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Decision Tree ◽

Shallow Landslide ◽

Machine Learning Algorithms ◽

Support Vector ◽

Svm Algorithm ◽

Alternating Decision Tree ◽

Bayesian Logistic Regression

This paper aims to apply and compare the performance of the three machine learning algorithms–support vector machine (SVM), bayesian logistic regression (BLR), and alternating decision tree (ADTree)–to map landslide susceptibility along the mountainous road of the Salavat Abad saddle, Kurdistan province, Iran. We identified 66 shallow landslide locations, based on field surveys, by recording the locations of the landslides by a global position System (GPS), Google Earth imagery and black-and-white aerial photographs (scale 1: 20,000) and 19 landslide conditioning factors, then tested these factors using the information gain ratio (IGR) technique. We checked the validity of the models using statistical metrics, including sensitivity, specificity, accuracy, kappa, root mean square error (RMSE), and area under the receiver operating characteristic curve (AUC). We found that, although all three machine learning algorithms yielded excellent performance, the SVM algorithm (AUC = 0.984) slightly outperformed the BLR (AUC = 0.980), and ADTree (AUC = 0.977) algorithms. We observed that not only all three algorithms are useful and effective tools for identifying shallow landslide-prone areas but also the BLR algorithm can be used such as the SVM algorithm as a soft computing benchmark algorithm to check the performance of the models in future.

Download Full-text

Bivariate Based Susceptibility Mapping for Gully Erosion in Wanjoga River Catchment Upper Tana Basin, Kenya

East African Journal of Science, Technology and Innovation ◽

10.37425/eajsti.v3i1.369 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Cecilia Wawira Ireri ◽

George Krhoda ◽

Mukhovi Stellah

Keyword(s):

Land Use ◽

Land Cover ◽

Arid Regions ◽

Rainfall Variability ◽

Gully Erosion ◽

Geographical Information ◽

Susceptibility Map ◽

Error Matrix ◽

Conditioning Factors ◽

Semi Arid

Gullies occur in semi-arid regions characterized by rainfall variability and seasonality, increased overland flow, affecting ecological fragility of an area. In most gully prone areas, extent of land affected by gullies is increasing. Thus, predicting susceptibility to gully erosion in semi-arid environment is an important step towards effectively rehabilitating and prevention against gully erosion. Proneness to gully occurrence was assessed against; Land cover/land use, slope, soil characteristics, rainfall variability and elevation, and modelled using geographical information system (GIS)-based bivariate statistical approach. Objectives of the study were; a) to assess influence of geomorphological factors on gully erosion, b) analyze and develop gully erosion susceptibility map, c) verify gully susceptibility images using error matrix of class labels in classified map against ground truth reference data. Total of 66 gullied areas (width and depth ≥ ranging 0.5), were mapped using 15m resolution Landsat images for 2018 and field surveys to estimate susceptibility to gully erosion by Global Mapper software in GIS. The images were verified using 120 pixels of known 15 gully presence or absence to produce an error matrix based on comparison of actual outcomes to predicted outcomes. Influence of conditioning factors to gully erosion showed a significant positive relationship between gully susceptibility and gully conditioning factors with consistency value; CR =0.097; value< 0.1, indicating, individual conditioning factors had an importance in influencing gully erosion. Slope (43%) and soil lithotype (25%), most influenced gully susceptibility, while land cover/land use (12%) and rainfall (12%) had least impact. Verification results showed satisfactory agreement between susceptibility map and field data on gullied areas at approximately 76.2%, an error of positive value of 4% and a negative value of 7%. Thus, production of susceptibility map by bivariate statistical method represents a useful tool for ending long and short-term gully emergencies by planning conservation of semi-arid regions.

Download Full-text

Modelling hydrological responses under climate change using machine learning algorithms – semi-arid river basin of peninsular India

H2Open Journal ◽

10.2166/h2oj.2020.034 ◽

2020 ◽

Vol 3 (1) ◽

pp. 481-498

Author(s):

G. Sireesha Naidu ◽

M. Pratik ◽

S. Rehana

Keyword(s):

Climate Change ◽

Machine Learning ◽

Random Forest ◽

River Basin ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Peninsular India ◽

Hydrological Responses ◽

Semi Arid

Abstract Catchment scale conceptual hydrological models apply calibration parameters entirely based on observed historical data in the climate change impact assessment. The study used the most advanced machine learning algorithms based on Ensemble Regression and Random Forest models to develop dynamically calibrated factors which can form as a basis for the analysis of hydrological responses under climate change. The Random Forest algorithm was identified as a robust method to model the calibration factors with limited data for training and testing with precipitation, evapotranspiration and uncalibrated runoff based on various performance measures. The developed model was further used to study the runoff response under climate change variability of precipitation and temperatures. A statistical downscaling model based on K-means clustering, Classification and Regression Trees and Support Vector Regression was used to develop the precipitation and temperature projections based on MIROC GCM outputs with the RCP 4.5 scenario. The proposed modelling framework has been demonstrated on a semi-arid river basin of peninsular India, Krishna River Basin (KRB). The basin outlet runoff was predicted to decrease (13.26%) for future scenarios under climate change due to an increase in temperature (0.6 °C), compared to a precipitation increase (13.12%), resulting in an overall reduction in water availability over KRB.

Download Full-text

Machine Learning Framework to Predict Chronic Kidney Disease using Ensemble Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.d9107.069520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 1-6

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Random Forest ◽

Kidney Disease ◽

Decision Tree ◽

Performance Metrics ◽

Weighted Average ◽

Gradient Boosting ◽

Support Vector ◽

The Individual

Chronic Kidney Disease (CKD) is a worldwide concern that influences roughly 10% of the grown-up population on the world. For most of the people the early diagnosis of CKD is often not possible. Therefore, the utilization of present-day Computer aided supported strategies is important to help the conventional CKD finding framework to be progressively effective and precise. In this project, six modern machine learning techniques namely Multilayer Perceptron Neural Network, Support Vector Machine, Naïve Bayes, K-Nearest Neighbor, Decision Tree, Logistic regression were used and then to enhance the performance of the model Ensemble Algorithms such as ADABoost, Gradient Boosting, Random Forest, Majority Voting, Bagging and Weighted Average were used on the Chronic Kidney Disease dataset from the UCI Repository. The model was tuned finely to get the best hyper parameters to train the model. The performance metrics used to evaluate the model was measured using Accuracy, Precision, Recall, F1-score, Mathew`s Correlation Coefficient and ROC-AUC curve. The experiment was first performed on the individual classifiers and then on the Ensemble classifiers. The ensemble classifier like Random Forest and ADABoost performed better with 100% Accuracy, Precision and Recall when compared to the individual classifiers with 99.16% accuracy, 98.8% Precision and 100% Recall obtained from Decision Tree Algorithm

Download Full-text