Comparisons of Diverse Machine Learning Approaches for Wildfire Susceptibility Mapping

Khalil Gholamnia; Thimmaiah Gudiyangada Nachappa; Omid Ghorbanzadeh; Thomas Blaschke

doi:10.3390/sym12040604

Comparisons of Diverse Machine Learning Approaches for Wildfire Susceptibility Mapping

Symmetry ◽

10.3390/sym12040604 ◽

2020 ◽

Vol 12 (4) ◽

pp. 604 ◽

Cited By ~ 6

Author(s):

Khalil Gholamnia ◽

Thimmaiah Gudiyangada Nachappa ◽

Omid Ghorbanzadeh ◽

Thomas Blaschke

Keyword(s):

Machine Learning ◽

The United States ◽

Susceptibility Mapping ◽

Support Vector ◽

Learning Approaches ◽

Inventory Data ◽

Operating Characteristics ◽

Remotely Sensed Data ◽

Positioning Systems ◽

Wildfire Susceptibility

Climate change has increased the probability of the occurrence of catastrophes like wildfires, floods, and storms across the globe in recent years. Weather conditions continue to grow more extreme, and wildfires are occurring quite frequently and are spreading with greater intensity. Wildfires ravage forest areas, as recently seen in the Amazon, the United States, and more recently in Australia. The availability of remotely sensed data has vastly improved, and enables us to precisely locate wildfires for monitoring purposes. Wildfire inventory data was created by integrating the polygons collected through field surveys using global positioning systems (GPS) and the data collected from the moderate resolution imaging spectrometer (MODIS) thermal anomalies product between 2012 and 2017 for the study area. The inventory data, along with sixteen conditioning factors selected for the study area, was used to appraise the potential of various machine learning (ML) methods for wildfire susceptibility mapping in Amol County. The ML methods chosen for this study are artificial neural network (ANN), dmine regression (DR), DM neural, least angle regression (LARS), multi-layer perceptron (MLP), random forest (RF), radial basis function (RBF), self-organizing maps (SOM), support vector machine (SVM), and decision tree (DT), along with the statistical approach of logistic regression (LR), which is very apt for wildfire susceptibility studies. The wildfire inventory data was categorized as three-fold, with 66% being used for training the models and 33% being used for accuracy assessment within three-fold cross-validation (CV). Receiver operating characteristics (ROC) was used to assess the accuracy of the ML approaches. RF had the highest accuracy of 88%, followed by SVM with an accuracy of almost 79%, and LR had the lowest accuracy of 65%. This shows that RF is better suited for wildfire susceptibility assessments in our case study area.

Download Full-text

Spatial Prediction of Wildfire Susceptibility Using Field Survey GPS Data and Machine Learning Approaches

Fire ◽

10.3390/fire2030043 ◽

2019 ◽

Vol 2 (3) ◽

pp. 43 ◽

Cited By ~ 21

Author(s):

Omid Ghorbanzadeh ◽

Khalil Valizadeh Kamran ◽

Thomas Blaschke ◽

Jagannath Aryal ◽

Amin Naboureh ◽

...

Keyword(s):

Climate Change ◽

Machine Learning ◽

Global Climate ◽

Spatial Prediction ◽

Support Vector ◽

Learning Approaches ◽

Operating Characteristics ◽

Remotely Sensed Data ◽

Northern Iran ◽

Wildfire Susceptibility

Recently, global climate change discussions have become more prominent, and forests are considered as the ecosystems most at risk by the consequences of climate change. Wildfires are among one of the main drivers leading to losses in forested areas. The increasing availability of free remotely sensed data has enabled the precise locations of wildfires to be reliably monitored. A wildfire data inventory was created by integrating global positioning system (GPS) polygons with data collected from the moderate resolution imaging spectroradiometer (MODIS) thermal anomalies product between 2012 and 2017 for Amol County, northern Iran. The GPS polygon dataset from the state wildlife organization was gathered through extensive field surveys. The integrated inventory dataset, along with sixteen conditioning factors (topographic, meteorological, vegetation, anthropological, and hydrological factors), was used to evaluate the potential of different machine learning (ML) approaches for the spatial prediction of wildfire susceptibility. The applied ML approaches included an artificial neural network (ANN), support vector machines (SVM), and random forest (RF). All ML approaches were trained using 75% of the wildfire inventory dataset and tested using the remaining 25% of the dataset in the four-fold cross-validation (CV) procedure. The CV method is used for dealing with the randomness effects of the training and testing dataset selection on the performance of applied ML approaches. To validate the resulting wildfire susceptibility maps based on three different ML approaches and four different folds of inventory datasets, the true positive and false positive rates were calculated. In the following, the accuracy of each of the twelve resulting maps was assessed through the receiver operating characteristics (ROC) curve. The resulting CV accuracies were 74%, 79% and 88% for the ANN, SVM and RF, respectively.

Download Full-text

Factors Affecting Landslide Susceptibility Mapping: Assessing the Influence of Different Machine Learning Approaches, Sampling Strategies and Data Splitting

Land ◽

10.3390/land10090989 ◽

2021 ◽

Vol 10 (9) ◽

pp. 989

Author(s):

Minu Treesa Abraham ◽

Neelima Satyam ◽

Revuri Lokesh ◽

Biswajeet Pradhan ◽

Abdullah Alamri

Keyword(s):

Machine Learning ◽

Landslide Susceptibility ◽

Sampling Strategy ◽

Susceptibility Mapping ◽

Landslide Susceptibility Mapping ◽

Support Vector ◽

Susceptibility Map ◽

Learning Approaches ◽

Sampling Strategies ◽

Data Splitting

Data driven methods are widely used for the development of Landslide Susceptibility Mapping (LSM). The results of these methods are sensitive to different factors, such as the quality of input data, choice of algorithm, sampling strategies, and data splitting ratios. In this study, five different Machine Learning (ML) algorithms are used for LSM for the Wayanad district in Kerala, India, using two different sampling strategies and nine different train to test ratios in cross validation. The results show that Random Forest (RF), K Nearest Neighbors (KNN), and Support Vector Machine (SVM) algorithms provide better results than Naïve Bayes (NB) and Logistic Regression (LR) for the study area. NB and LR algorithms are less sensitive to the sampling strategy and data splitting, while the performance of the other three algorithms is considerably influenced by the sampling strategy. From the results, both the choice of algorithm and sampling strategy are critical in obtaining the best suited landslide susceptibility map for a region. The accuracies of KNN, RF, and SVM algorithms have increased by 10.51%, 10.02%, and 4.98% with the use of polygon landslide inventory data, while for NB and LR algorithms, the performance was slightly reduced with the use of polygon data. Thus, the sampling strategy and data splitting ratio are less consequential with NB and algorithms, while more data points provide better results for KNN, RF, and SVM algorithms.

Download Full-text

Novel Machine Learning Approaches for Modelling the Gully Erosion Susceptibility

Remote Sensing ◽

10.3390/rs12172833 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2833 ◽

Cited By ~ 3

Author(s):

Alireza Arabameri ◽

Omid Asadi Nalivan ◽

Subodh Chandra Pal ◽

Rabin Chakrabortty ◽

Asish Saha ◽

...

Keyword(s):

Machine Learning ◽

Water Conservation ◽

Learning Algorithm ◽

Gully Erosion ◽

Support Vector ◽

Learning Approaches ◽

Operating Characteristics ◽

Validation Data ◽

Data Set ◽

Jackknife Test

The extreme form of land degradation caused by the formation of gullies is a major challenge for the sustainability of land resources. This problem is more vulnerable in the arid and semi-arid environment and associated damage to agriculture and allied economic activities. Appropriate modeling of such erosion is therefore needed with optimum accuracy for estimating vulnerable regions and taking appropriate initiatives. The Golestan Dam has faced an acute problem of gully erosion over the last decade and has adversely affected society. Here, the artificial neural network (ANN), general linear model (GLM), maximum entropy (MaxEnt), and support vector machine (SVM) machine learning algorithm with 90/10, 80/20, 70/30, 60/40, and 50/50 random partitioning of training and validation samples was selected purposively for estimating the gully erosion susceptibility. The main objective of this work was to predict the susceptible zone with the maximum possible accuracy. For this purpose, random partitioning approaches were implemented. For this purpose, 20 gully erosion conditioning factors were considered for predicting the susceptible areas by considering the multi-collinearity test. The variance inflation factor (VIF) and tolerance (TOL) limit were considered for multi-collinearity assessment for reducing the error of the models and increase the efficiency of the outcome. The ANN with 50/50 random partitioning of the sample is the most optimal model in this analysis. The area under curve (AUC) values of receiver operating characteristics (ROC) in ANN (50/50) for the training and validation data are 0.918 and 0.868, respectively. The importance of the causative factors was estimated with the help of the Jackknife test, which reveals that the most important factor is the topography position index (TPI). Apart from this, the prioritization of all predicted models was estimated taking into account the training and validation data set, which should help future researchers to select models from this perspective. This type of outcome should help planners and local stakeholders to implement appropriate land and water conservation measures.

Download Full-text

Machine learning approaches to classify melon landraces based on phenotypic traits

Genetika ◽

10.2298/gensr2003021n ◽

2020 ◽

Vol 52 (3) ◽

pp. 1021-1029

Author(s):

Rad Naroui ◽

Gholamali Keykha ◽

Jahangir Abbaskoohpayegani ◽

Ramin Rafezi

Keyword(s):

Machine Learning ◽

Regression Trees ◽

Classification And Regression Trees ◽

Partial Least Square ◽

Support Vector ◽

Phenotypic Traits ◽

Learning Approaches ◽

Operating Characteristics ◽

Kappa Value ◽

Classification And Regression

Phenotyping of native cultivars is becoming more essential, as they are an important for breeders as a genetic source for breeding. The variability of morphological properties plays critical role in melon breeding. In this paper various machine learning approaches were implemented to identify melon accession classes. A field experiment was conducted in Zahak Agriculture station to differentiate 144 melon accessions based on 14 traits. For this, Partial Least Square Discriminant Analysis (PLS-DA), Support Vector Machine (SVM), Random Forest (RF), K-Nearest Neighbor (KNN) and Classification And Regression Trees (CART) were compared. The most commonly used performance values comprise overall accuracy, kappa value, Receiver Operating Characteristics (ROC) and Area Under Curve (AUC) were performed to identify accuracy of the models. The results showed the best performance for CART than others. The AUC and kappa value were 0.85 and 0.80 and fruit weight was the most important trait that affecting diversity in melon accessions. Regarding to these results Classification And Regression Trees (CART) is reliable for identification of melon accessions classes.

Download Full-text

Wildfire susceptibility assessment: evaluation of the performance of different machine learning algorithms

10.5194/egusphere-egu21-7162 ◽

2021 ◽

Author(s):

Andrea Trucchia ◽

Sara Isnardi ◽

Mirko D'Andrea ◽

Guido Biondi ◽

Paolo Fiorucci ◽

...

Keyword(s):

Machine Learning ◽

Risk Assessment ◽

Expert Knowledge ◽

Susceptibility Mapping ◽

Machine Learning Algorithms ◽

Validation Dataset ◽

Burned Area ◽

Support Vector ◽

Testing Dataset ◽

Wildfire Susceptibility

Wildfires constitute a complex environmental disaster triggered by several interacting natural and human factors that can affect the biodiversity, species composition and ecosystems, but also human lives, regional economies and environmental health. Therefore, wildfires have become the focus on forestry and ecological research and are receiving considerable attention in forest management. Current advances in automated learning and simulation methods, like machine learning (ML) algorithms, recently aroused great interest in wildfires risk assessment and mapping. This quantitative evaluation is carried out by taking into account two factors: the location and spatial extension of past wildfires events and the geo-environmental and anthropogenic predisposing factors that favored their ignition and spreading. When dealing with risk assessment and predictive mapping for natural phenomena, it is crucial to ascertain the reliability and validity of collected data, as well as the prediction capability of the obtained results. In a previous study (Tonini et al. 2020) authors applied Random Forest (RF) to elaborate wildfire susceptibility mapping for Liguria region (Italy). In the present study, we address to the following outstanding issues, which are still unsolved: (1) the vegetation map included a class labeled &#8220;burned area&#8221; that masked to true burned vegetation; (2) the implemented model based on RF gave good results, but it needs to be compared with other ML based approaches; (3) to test the predictive capabilities of the model, the last three years of observations were taken, but these are not fully representative of different wildfires regimes, characterizing non-consecutives years. Thus, by improving the analyses, the following results were finally achieved. 1) the class &#8220;burned areas&#8221; has been reclassified based on expert knowledge, and the type of vegetation correctly assigned. This allowed correctly estimating the relative importance of each vegetation class belonging to this variable. (2) Two additional ML based approach, namely Multi-Layer Perceptron (MLP) and Support Vector Machine (SVM), were tested besides RF and the performance of each model was assessed, as well as the resulting variable ranking and the predicting outputs. This allowed comparing the three ML based approaches and evaluating the pros and cons of each one. (3) The training and testing dataset were selected by extracting the yearly-observations based on a clustering procedure, allowing accounting for the temporal variability of the burning seasons. As result, our models can perform on average better prediction in different situations, by taking into considering years experiencing more or less wildfires than usual. The three ML-based models (RF, SVM and MLP) were finally validated by means of two metrics: i) the Area Under the ROC Curve, selecting the validation dataset by using a 5-folds cross validation procedure; ii) the RMS errors, computed by evaluating the difference between the predicted probability outputs and the presence/absence of an observed event in the testing dataset. Bibliography: Tonini, M.; D&#8217;Andrea, M.; Biondi, G.; Degli Esposti, S.; Trucchia, A.; Fiorucci, P. A Machine Learning-Based Approach for Wildfire Susceptibility Mapping. The Case Study of the Liguria Region in Italy.&#160;Geosciences&#160;2020,&#160;10, 105. https://doi.org/10.3390/geosciences10030105

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Multilayer Soil Moisture Mapping at a Regional Scale from Multisource Data via a Machine Learning Method

Remote Sensing ◽

10.3390/rs11030284 ◽

2019 ◽

Vol 11 (3) ◽

pp. 284 ◽

Cited By ~ 1

Author(s):

Linglin Zeng ◽

Shun Hu ◽

Daxiang Xiang ◽

Xiang Zhang ◽

Deren Li ◽

...

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Regional Scale ◽

Remotely Sensed ◽

Temporal Variations ◽

Training Data ◽

Estimation Accuracy ◽

Learning Approaches ◽

Remotely Sensed Data ◽

Deep Soil

Soil moisture mapping at a regional scale is commonplace since these data are required in many applications, such as hydrological and agricultural analyses. The use of remotely sensed data for the estimation of deep soil moisture at a regional scale has received far less emphasis. The objective of this study was to map the 500-m, 8-day average and daily soil moisture at different soil depths in Oklahoma from remotely sensed and ground-measured data using the random forest (RF) method, which is one of the machine-learning approaches. In order to investigate the estimation accuracy of the RF method at both a spatial and a temporal scale, two independent soil moisture estimation experiments were conducted using data from 2010 to 2014: a year-to-year experiment (with a root mean square error (RMSE) ranging from 0.038 to 0.050 m3/m3) and a station-to-station experiment (with an RMSE ranging from 0.044 to 0.057 m3/m3). Then, the data requirements, importance factors, and spatial and temporal variations in estimation accuracy were discussed based on the results using the training data selected by iterated random sampling. The highly accurate estimations of both the surface and the deep soil moisture for the study area reveal the potential of RF methods when mapping soil moisture at a regional scale, especially when considering the high heterogeneity of land-cover types and topography in the study area.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Practical CO2—WAG Field Operational Designs Using Hybrid Numerical-Machine-Learning Approaches

Energies ◽

10.3390/en14041055 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1055

Author(s):

Qian Sun ◽

William Ampomah ◽

Junyu You ◽

Martha Cather ◽

Robert Balch

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

History Matching ◽

Optimization Problems ◽

Learning Technologies ◽

Petroleum Engineering ◽

Support Vector ◽

Learning Approaches ◽

Field Development ◽

Proxy Models

Machine-learning technologies have exhibited robust competences in solving many petroleum engineering problems. The accurate predictivity and fast computational speed enable a large volume of time-consuming engineering processes such as history-matching and field development optimization. The Southwest Regional Partnership on Carbon Sequestration (SWP) project desires rigorous history-matching and multi-objective optimization processes, which fits the superiorities of the machine-learning approaches. Although the machine-learning proxy models are trained and validated before imposing to solve practical problems, the error margin would essentially introduce uncertainties to the results. In this paper, a hybrid numerical machine-learning workflow solving various optimization problems is presented. By coupling the expert machine-learning proxies with a global optimizer, the workflow successfully solves the history-matching and CO2 water alternative gas (WAG) design problem with low computational overheads. The history-matching work considers the heterogeneities of multiphase relative characteristics, and the CO2-WAG injection design takes multiple techno-economic objective functions into accounts. This work trained an expert response surface, a support vector machine, and a multi-layer neural network as proxy models to effectively learn the high-dimensional nonlinear data structure. The proposed workflow suggests revisiting the high-fidelity numerical simulator for validation purposes. The experience gained from this work would provide valuable guiding insights to similar CO2 enhanced oil recovery (EOR) projects.

Download Full-text

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text