scholarly journals Forest Fire Susceptibility Prediction Based on Machine Learning Models with Resampling Algorithms on Remote Sensing Data

2020 ◽  
Vol 12 (22) ◽  
pp. 3682
Author(s):  
Bahareh Kalantar ◽  
Naonori Ueda ◽  
Mohammed O. Idrees ◽  
Saeid Janizadeh ◽  
Kourosh Ahmadi ◽  
...  

This study predicts forest fire susceptibility in Chaloos Rood watershed in Iran using three machine learning (ML) models—multivariate adaptive regression splines (MARS), support vector machine (SVM), and boosted regression tree (BRT). The study utilizes 14 set of fire predictors derived from vegetation indices, climatic variables, environmental factors, and topographical features. To assess the suitability of the models and estimating the variance and bias of estimation, the training dataset obtained from the Natural Resources Directorate of Mazandaran province was subjected to resampling using cross validation (CV), bootstrap, and optimism bootstrap techniques. Using variance inflation factor (VIF), weight indicating the strength of the spatial relationship of the predictors to fire occurrence was assigned to each contributing variable. Subsequently, the models were trained and validated using the receiver operating characteristics (ROC) area under the curve (AUC) curve. Results of the model validation based on the resampling techniques (non, 5- and 10-fold CV, bootstrap and optimism bootstrap) produced AUC values of 0.78, 0.88, 0.90, 0.86 and 0.83 for the MARS model; 0.82, 0.82, 0.89, 0.87, 0.84 for the SVM and 0.87, 0.90, 0.90, 0.90, 0.91 for the BRT model. Across the individual model, the 10-fold CV performed best in MARS and SVM with AUC values of 0.90 and 0.89. Overall, the BRT outperformed the other models in all ramification with highest AUC value of 0.91 using optimism bootstrap resampling algorithm. Generally, the resampling process enhanced the prediction performance of all the models.

2020 ◽  
Vol 12 (22) ◽  
pp. 3675
Author(s):  
Subodh Chandra Pal ◽  
Alireza Arabameri ◽  
Thomas Blaschke ◽  
Indrajit Chowdhuri ◽  
Asish Saha ◽  
...  

Gully formation through water-induced soil erosion and related to devastating land degradation is often a quasi-normal threat to human life, as it is responsible for huge loss of surface soil. Therefore, gully erosion susceptibility (GES) mapping is necessary in order to reduce the adverse effect of land degradation and diminishes this type of harmful consequences. The principle goal of the present research study is to develop GES maps for the Garhbeta I Community Development (C.D.) Block; West Bengal, India, by using a machine learning algorithm (MLA) of boosted regression tree (BRT), bagging and the ensemble of BRT-bagging with K-fold cross validation (CV) resampling techniques. The combination of the aforementioned MLAs with resampling approaches is state-of-the-art soft computing, not often used in GES evaluation. In further progress of our research work, here we used a total of 20 gully erosion conditioning factors (GECFs) and a total of 199 gully head cut points for modelling GES. The variables’ importance, which is responsible for gully erosion, was determined based on the random forest (RF) algorithm among the several GECFs used in this study. The output result of the model’s performance was validated through a receiver operating characteristics-area under curve (ROC-AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) statistical analysis. The predicted result shows that the ensemble of BRT-bagging is the most well fitted for GES where AUC value in K-3 fold is 0.972, whereas the value of AUC in sensitivity, specificity, PPV and NPV is 0.94, 0.93, 0.96 and 0.93, respectively, in a training dataset, and followed by the bagging and BRT model. Thus, from the predictive performance of this research study it is concluded that the ensemble of BRT-Bagging can be applied as a new approach for further studies in spatial prediction of GES. The outcome of this work can be helpful to policy makers in implementing remedial measures to minimize damages caused by gully erosion.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Toktam Khatibi ◽  
Elham Hanifi ◽  
Mohammad Mehdi Sepehri ◽  
Leila Allahqoli

Abstract Background Stillbirth is defined as fetal loss in pregnancy beyond 28 weeks by WHO. In this study, a machine-learning based method is proposed to predict stillbirth from livebirth and discriminate stillbirth before and during delivery and rank the features. Method A two-step stack ensemble classifier is proposed for classifying the instances into stillbirth and livebirth at the first step and then, classifying stillbirth before delivery from stillbirth during the labor at the second step. The proposed SE has two consecutive layers including the same classifiers. The base classifiers in each layer are decision tree, Gradient boosting classifier, logistics regression, random forest and support vector machines which are trained independently and aggregated based on Vote boosting method. Moreover, a new feature ranking method is proposed in this study based on mean decrease accuracy, Gini Index and model coefficients to find high-ranked features. Results IMAN registry dataset is used in this study considering all births at or beyond 28th gestational week from 2016/04/01 to 2017/01/01 including 1,415,623 live birth and 5502 stillbirth cases. A combination of maternal demographic features, clinical history, fetal properties, delivery descriptors, environmental features, healthcare service provider descriptors and socio-demographic features are considered. The experimental results show that our proposed SE outperforms the compared classifiers with the average accuracy of 90%, sensitivity of 91%, specificity of 88%. The discrimination of the proposed SE is assessed and the average AUC of ±95%, CI of 90.51% ±1.08 and 90% ±1.12 is obtained on training dataset for model development and test dataset for external validation, respectively. The proposed SE is calibrated using isotopic nonparametric calibration method with the score of 0.07. The process is repeated 10,000 times and AUC of SE classifiers using random different training datasets as null distribution. The obtained p-value to assess the specificity of the proposed SE is 0.0126 which shows the significance of the proposed SE. Conclusions Gestational age and fetal height are two most important features for discriminating livebirth from stillbirth. Moreover, hospital, province, delivery main cause, perinatal abnormality, miscarriage number and maternal age are the most important features for classifying stillbirth before and during delivery.


2021 ◽  
Vol 10 (1) ◽  
pp. 42
Author(s):  
Kieu Anh Nguyen ◽  
Walter Chen ◽  
Bor-Shiun Lin ◽  
Uma Seeboonruang

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.


Author(s):  
Jonas Marx ◽  
Stefan Gantner ◽  
Jörn Städing ◽  
Jens Friedrichs

In recent years, the demands of Maintenance, Repair and Overhaul (MRO) customers to provide resource-efficient after market services have grown increasingly. One way to meet these requirements is by making use of predictive maintenance methods. These are ideas that involve the derivation of workscoping guidance by assessing and processing previously unused or undocumented service data. In this context a novel approach on predictive maintenance is presented in form of a performance-based classification method for high pressure compressor (HPC) airfoils. The procedure features machine learning algorithms that establish a relation between the airfoil geometry and the associated aerodynamic behavior and is hereby able to divide individual operating characteristics into a finite number of distinct aero-classes. By this means the introduced method not only provides a fast and simple way to assess piece part performance through geometrical data, but also facilitates the consideration of stage matching (axial as well as circumferential) in a simplified manner. It thus serves as prerequisite for an improved customary HPC performance workscope as well as for an automated optimization process for compressor buildup with used or repaired material that would be applicable in an MRO environment. The methods of machine learning that are used in the present work enable the formation of distinct groups of similar aero-performance by unsupervised (step 1) and supervised learning (step 2). The application of the overall classification procedure is shown exemplary on an artificially generated dataset based on real characteristics of a front and a rear rotor of a 10-stage axial compressor that contains both geometry as well as aerodynamic information. In step 1 of the investigation only the aerodynamic quantities in terms of multivariate functional data are used in order to benchmark different clustering algorithms and generate a foundation for a geometry-based aero-classification. Corresponding classifiers are created in step 2 by means of both, the k Nearest Neighbor and the linear Support Vector Machine algorithms. The methods’ fidelities are brought to the test with the attempt to recover the aero-based similarity classes solely by using normalized and reduced geometry data. This results in high classification probabilities of up to 96 % which is proven by using stratified k-fold cross-validation.


2020 ◽  
Author(s):  
Wanjun Zhao ◽  
Yong Zhang ◽  
Xinming Li ◽  
Yonghong Mao ◽  
Changwei Wu ◽  
...  

AbstractBackgroundBy extracting the spectrum features from urinary proteomics based on an advanced mass spectrometer and machine learning algorithms, more accurate reporting results can be achieved for disease classification. We attempted to establish a novel diagnosis model of kidney diseases by combining machine learning with an extreme gradient boosting (XGBoost) algorithm with complete mass spectrum information from the urinary proteomics.MethodsWe enrolled 134 patients (including those with IgA nephropathy, membranous nephropathy, and diabetic kidney disease) and 68 healthy participants as a control, and for training and validation of the diagnostic model, applied a total of 610,102 mass spectra from their urinary proteomics produced using high-resolution mass spectrometry. We divided the mass spectrum data into a training dataset (80%) and a validation dataset (20%). The training dataset was directly used to create a diagnosis model using XGBoost, random forest (RF), a support vector machine (SVM), and artificial neural networks (ANNs). The diagnostic accuracy was evaluated using a confusion matrix. We also constructed the receiver operating-characteristic, Lorenz, and gain curves to evaluate the diagnosis model.ResultsCompared with RF, the SVM, and ANNs, the modified XGBoost model, called a Kidney Disease Classifier (KDClassifier), showed the best performance. The accuracy of the diagnostic XGBoost model was 96.03% (CI = 95.17%-96.77%; Kapa = 0.943; McNemar’s Test, P value = 0.00027). The area under the curve of the XGBoost model was 0.952 (CI = 0.9307-0.9733). The Kolmogorov-Smirnov (KS) value of the Lorenz curve was 0.8514. The Lorenz and gain curves showed the strong robustness of the developed model.ConclusionsThis study presents the first XGBoost diagnosis model, i.e., the KDClassifier, combined with complete mass spectrum information from the urinary proteomics for distinguishing different kidney diseases. KDClassifier achieves a high accuracy and robustness, providing a potential tool for the classification of all types of kidney diseases.


2020 ◽  
Author(s):  
Eunjeong Park ◽  
Kijeong Lee ◽  
Taehwa Han ◽  
Hyo Suk Nam

BACKGROUND Subtle abnormal motor signs are indications of serious neurological diseases. Although neurological deficits require fast initiation of treatment in a restricted time, it is difficult for nonspecialists to detect and objectively assess the symptoms. In the clinical environment, diagnoses and decisions are based on clinical grading methods, including the National Institutes of Health Stroke Scale (NIHSS) score or the Medical Research Council (MRC) score, which have been used to measure motor weakness. Objective grading in various environments is necessitated for consistent agreement among patients, caregivers, paramedics, and medical staff to facilitate rapid diagnoses and dispatches to appropriate medical centers. OBJECTIVE In this study, we aimed to develop an autonomous grading system for stroke patients. We investigated the feasibility of our new system to assess motor weakness and grade NIHSS and MRC scores of 4 limbs, similar to the clinical examinations performed by medical staff. METHODS We implemented an automatic grading system composed of a measuring unit with wearable sensors and a grading unit with optimized machine learning. Inertial sensors were attached to measure subtle weaknesses caused by paralysis of upper and lower limbs. We collected 60 instances of data with kinematic features of motor disorders from neurological examination and demographic information of stroke patients with NIHSS 0 or 1 and MRC 7, 8, or 9 grades in a stroke unit. Training data with 240 instances were generated using a synthetic minority oversampling technique to complement the imbalanced number of data between classes and low number of training data. We trained 2 representative machine learning algorithms, an ensemble and a support vector machine (SVM), to implement auto-NIHSS and auto-MRC grading. The optimized algorithms performed a 5-fold cross-validation and were searched by Bayes optimization in 30 trials. The trained model was tested with the 60 original hold-out instances for performance evaluation in accuracy, sensitivity, specificity, and area under the receiver operating characteristics curve (AUC). RESULTS The proposed system can grade NIHSS scores with an accuracy of 83.3% and an AUC of 0.912 using an optimized ensemble algorithm, and it can grade with an accuracy of 80.0% and an AUC of 0.860 using an optimized SVM algorithm. The auto-MRC grading achieved an accuracy of 76.7% and a mean AUC of 0.870 in SVM classification and an accuracy of 78.3% and a mean AUC of 0.877 in ensemble classification. CONCLUSIONS The automatic grading system quantifies proximal weakness in real time and assesses symptoms through automatic grading. The pilot outcomes demonstrated the feasibility of remote monitoring of motor weakness caused by stroke. The system can facilitate consistent grading with instant assessment and expedite dispatches to appropriate hospitals and treatment initiation by sharing auto-MRC and auto-NIHSS scores between prehospital and hospital responses as an objective observation.


2020 ◽  
Vol 7 (7) ◽  
pp. 2103
Author(s):  
Yoshihisa Matsunaga ◽  
Ryoichi Nakamura

Background: Abdominal cavity irrigation is a more minimally invasive surgery than that using a gas. Minimally invasive surgery improves the quality of life of patients; however, it demands higher skills from the doctors. Therefore, the study aimed to reduce the burden by assisting and automating the hemostatic procedure a highly frequent procedure by taking advantage of the clearness of the endoscopic images and continuous bleeding point observations in the liquid. We aimed to construct a method for detecting organs, bleeding sites, and hemostasis regions.Methods: We developed a method to perform real-time detection based on machine learning using laparoscopic videos. Our training dataset was prepared from three experiments in pigs. Linear support vector machine was applied using new color feature descriptors. In the verification of the accuracy of the classifier, we performed five-part cross-validation. Classification processing time was measured to verify the real-time property. Furthermore, we visualized the time series class change of the surgical field during the hemostatic procedure.Results: The accuracy of our classifier was 98.3% and the processing cost to perform real-time was enough. Furthermore, it was conceivable to quantitatively indicate the completion of the hemostatic procedure based on the changes in the bleeding region by ablation and the hemostasis regions by tissue coagulation.Conclusions: The organs, bleeding sites, and hemostasis regions classification was useful for assisting and automating the hemostatic procedure in the liquid. Our method can be adapted to more hemostatic procedures. 


The online discussion forums and blogs are very vibrant platforms for cancer patients to express their views in the form of stories. These stories sometimes become a source of inspiration for some patients who are anxious in searching the similar cases. This paper proposes a method using natural language processing and machine learning to analyze unstructured texts accumulated from patient’s reviews and stories. The proposed methodology aims to identify behavior, emotions, side-effects, decisions and demographics associated with the cancer victims. The pre-processing phase of our work involves extraction of web text followed by text-cleaning where some special characters and symbols are omitted, and finally tagging the texts using NLTK’s (Natural Language Toolkit) POS (Parts of Speech) Tagger. The post-processing phase performs training of seven machine learning classifiers (refer Table 6). The Decision Tree classifier shows the higher precision (0.83) among the other classifiers while, the Area under the operating Characteristics (AUC) for Support Vector Machine (SVM) classifier is highest (0.98).


2021 ◽  
Author(s):  
Myeong Gyu Kim ◽  
Jae Hyun Kim ◽  
Kyungim Kim

BACKGROUND Garlic-related misinformation is prevalent whenever a virus outbreak occurs. Again, with the outbreak of coronavirus disease 2019 (COVID-19), garlic-related misinformation is spreading through social media sites, including Twitter. Machine learning-based approaches can be used to detect misinformation from vast tweets. OBJECTIVE This study aimed to develop machine learning algorithms for detecting misinformation on garlic and COVID-19 in Twitter. METHODS This study used 5,929 original tweets mentioning garlic and COVID-19. Tweets were manually labeled as misinformation, accurate information, and others. We tested the following algorithms: k-nearest neighbors; random forest; support vector machine (SVM) with linear, radial, and polynomial kernels; and neural network. Features for machine learning included user-based features (verified account, user type, number of followers, and follower rate) and text-based features (uniform resource locator, negation, sentiment score, Latent Dirichlet Allocation topic probability, number of retweets, and number of favorites). A model with the highest accuracy in the training dataset (70% of overall dataset) was tested using a test dataset (30% of overall dataset). Predictive performance was measured using overall accuracy, sensitivity, specificity, and balanced accuracy. RESULTS SVM with the polynomial kernel model showed the highest accuracy of 0.670. The model also showed a balanced accuracy of 0.757, sensitivity of 0.819, and specificity of 0.696 for misinformation. Important features in the misinformation and accurate information classes included topic 4 (common myths), topic 13 (garlic-specific myths), number of followers, topic 11 (misinformation on social media), and follower rate. Topic 3 (cooking recipes) was the most important feature in the others class. CONCLUSIONS Our SVM model showed good performance in detecting misinformation. The results of our study will help detect misinformation related to garlic and COVID-19. It could also be applied to prevent misinformation related to dietary supplements in the event of a future outbreak of a disease other than COVID-19.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Patricio Wolff ◽  
Manuel Graña ◽  
Sebastián A. Ríos ◽  
Maria Begoña Yarza

Background. Hospital readmission prediction in pediatric hospitals has received little attention. Studies have focused on the readmission frequency analysis stratified by disease and demographic/geographic characteristics but there are no predictive modeling approaches, which may be useful to identify preventable readmissions that constitute a major portion of the cost attributed to readmissions.Objective. To assess the all-cause readmission predictive performance achieved by machine learning techniques in the emergency department of a pediatric hospital in Santiago, Chile.Materials. An all-cause admissions dataset has been collected along six consecutive years in a pediatric hospital in Santiago, Chile. The variables collected are the same used for the determination of the child’s treatment administrative cost.Methods. Retrospective predictive analysis of 30-day readmission was formulated as a binary classification problem. We report classification results achieved with various model building approaches after data curation and preprocessing for correction of class imbalance. We compute repeated cross-validation (RCV) with decreasing number of folders to assess performance and sensitivity to effect of imbalance in the test set and training set size.Results. Increase in recall due to SMOTE class imbalance correction is large and statistically significant. The Naive Bayes (NB) approach achieves the best AUC (0.65); however the shallow multilayer perceptron has the best PPV and f-score (5.6 and 10.2, resp.). The NB and support vector machines (SVM) give comparable results if we consider AUC, PPV, and f-score ranking for all RCV experiments. High recall of deep multilayer perceptron is due to high false positive ratio. There is no detectable effect of the number of folds in the RCV on the predictive performance of the algorithms.Conclusions. We recommend the use of Naive Bayes (NB) with Gaussian distribution model as the most robust modeling approach for pediatric readmission prediction, achieving the best results across all training dataset sizes. The results show that the approach could be applied to detect preventable readmissions.


Sign in / Sign up

Export Citation Format

Share Document