Ensemble of Machine-Learning Methods for Predicting Gully Erosion Susceptibility

Subodh Chandra Pal; Alireza Arabameri; Thomas Blaschke; Indrajit Chowdhuri; Asish Saha; Rabin Chakrabortty; Saro Lee; Shahab. S. Band

doi:10.3390/rs12223675

Ensemble of Machine-Learning Methods for Predicting Gully Erosion Susceptibility

Remote Sensing ◽

10.3390/rs12223675 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3675

Author(s):

Subodh Chandra Pal ◽

Alireza Arabameri ◽

Thomas Blaschke ◽

Indrajit Chowdhuri ◽

Asish Saha ◽

...

Keyword(s):

Machine Learning ◽

Land Degradation ◽

Predictive Value ◽

Research Study ◽

Research Work ◽

Gully Erosion ◽

Training Dataset ◽

Operating Characteristics ◽

Boosted Regression Tree ◽

Sensitivity Specificity

Gully formation through water-induced soil erosion and related to devastating land degradation is often a quasi-normal threat to human life, as it is responsible for huge loss of surface soil. Therefore, gully erosion susceptibility (GES) mapping is necessary in order to reduce the adverse effect of land degradation and diminishes this type of harmful consequences. The principle goal of the present research study is to develop GES maps for the Garhbeta I Community Development (C.D.) Block; West Bengal, India, by using a machine learning algorithm (MLA) of boosted regression tree (BRT), bagging and the ensemble of BRT-bagging with K-fold cross validation (CV) resampling techniques. The combination of the aforementioned MLAs with resampling approaches is state-of-the-art soft computing, not often used in GES evaluation. In further progress of our research work, here we used a total of 20 gully erosion conditioning factors (GECFs) and a total of 199 gully head cut points for modelling GES. The variables’ importance, which is responsible for gully erosion, was determined based on the random forest (RF) algorithm among the several GECFs used in this study. The output result of the model’s performance was validated through a receiver operating characteristics-area under curve (ROC-AUC), sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) statistical analysis. The predicted result shows that the ensemble of BRT-bagging is the most well fitted for GES where AUC value in K-3 fold is 0.972, whereas the value of AUC in sensitivity, specificity, PPV and NPV is 0.94, 0.93, 0.96 and 0.93, respectively, in a training dataset, and followed by the bagging and BRT model. Thus, from the predictive performance of this research study it is concluded that the ensemble of BRT-Bagging can be applied as a new approach for further studies in spatial prediction of GES. The outcome of this work can be helpful to policy makers in implementing remedial measures to minimize damages caused by gully erosion.

Download Full-text

Forest Fire Susceptibility Prediction Based on Machine Learning Models with Resampling Algorithms on Remote Sensing Data

Remote Sensing ◽

10.3390/rs12223682 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3682

Author(s):

Bahareh Kalantar ◽

Naonori Ueda ◽

Mohammed O. Idrees ◽

Saeid Janizadeh ◽

Kourosh Ahmadi ◽

...

Keyword(s):

Machine Learning ◽

Forest Fire ◽

Spatial Relationship ◽

Remote Sensing Data ◽

Multivariate Adaptive Regression Splines ◽

Training Dataset ◽

Support Vector ◽

Operating Characteristics ◽

Mazandaran Province ◽

Boosted Regression Tree

This study predicts forest fire susceptibility in Chaloos Rood watershed in Iran using three machine learning (ML) models—multivariate adaptive regression splines (MARS), support vector machine (SVM), and boosted regression tree (BRT). The study utilizes 14 set of fire predictors derived from vegetation indices, climatic variables, environmental factors, and topographical features. To assess the suitability of the models and estimating the variance and bias of estimation, the training dataset obtained from the Natural Resources Directorate of Mazandaran province was subjected to resampling using cross validation (CV), bootstrap, and optimism bootstrap techniques. Using variance inflation factor (VIF), weight indicating the strength of the spatial relationship of the predictors to fire occurrence was assigned to each contributing variable. Subsequently, the models were trained and validated using the receiver operating characteristics (ROC) area under the curve (AUC) curve. Results of the model validation based on the resampling techniques (non, 5- and 10-fold CV, bootstrap and optimism bootstrap) produced AUC values of 0.78, 0.88, 0.90, 0.86 and 0.83 for the MARS model; 0.82, 0.82, 0.89, 0.87, 0.84 for the SVM and 0.87, 0.90, 0.90, 0.90, 0.91 for the BRT model. Across the individual model, the 10-fold CV performed best in MARS and SVM with AUC values of 0.90 and 0.89. Overall, the BRT outperformed the other models in all ramification with highest AUC value of 0.91 using optimism bootstrap resampling algorithm. Generally, the resampling process enhanced the prediction performance of all the models.

Download Full-text

New screening approach to detecting congenital syphilis in China: a retrospective cohort study

Archives of Disease in Childhood ◽

10.1136/archdischild-2020-320549 ◽

2020 ◽

pp. archdischild-2020-320549

Author(s):

Fang Hu ◽

Shuai-Jun Guo ◽

Jian-Jun Lu ◽

Ning-Xuan Hua ◽

Yan-Yan Song ◽

...

Keyword(s):

Predictive Value ◽

Congenital Syphilis ◽

Area Under The Curve ◽

Diagnostic Tools ◽

Operating Characteristics ◽

Loss To Follow Up ◽

Sensitivity Specificity ◽

Mother To Child ◽

Rule Out

BackgroundDiagnosis of congenital syphilis (CS) is not straightforward and can be challenging. This study aimed to evaluate the validity of an algorithm using timing of maternal antisyphilis treatment and titres of non-treponemal antibody as predictors of CS.MethodsConfirmed CS cases and those where CS was excluded were obtained from the Guangzhou Prevention of Mother-to-Child Transmission of syphilis programme between 2011 and 2019. We calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) using receiver operating characteristics (ROC) in two situations: (1) receiving antisyphilis treatment or no-treatment during pregnancy and (2) initiating treatment before 28 gestational weeks (GWs), initiating after 28 GWs or receiving no treatment for syphilis seropositive women.ResultsAmong 1558 syphilis-exposed children, 39 had confirmed CS. Area under the curve, sensitivity and specificity of maternal non-treponemal titres before treatment and treatment during pregnancy were 0.80, 76.9%, 78.7% and 0.79, 69.2%, 88.7%, respectively, for children with CS. For the algorithm, ROC results showed that PPV and NPV for predicting CS were 37.3% and 96.4% (non-treponemal titres cut-off value 1:8 and no antisyphilis treatment), 9.4% and 100% (non-treponemal titres cut-off value 1:16 and treatment after 28 GWs), 4.2% and 99.5% (non-treponemal titres cut-off value 1:32 and treatment before 28 GWs), respectively.ConclusionsAn algorithm using maternal non-treponemal titres and timing of treatment during pregnancy could be an effective strategy to diagnose or rule out CS, especially when the rate of loss to follow-up is high or there are no straightforward diagnostic tools.

Download Full-text

Machine Learning-Based Gully Erosion Susceptibility Mapping: A Case Study of Eastern India

Sensors ◽

10.3390/s20051313 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1313 ◽

Cited By ~ 15

Author(s):

Sunil Saha ◽

Jagabandhu Roy ◽

Alireza Arabameri ◽

Thomas Blaschke ◽

Dieu Tien Bui

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Absolute Error ◽

Gully Erosion ◽

Machine Learning Techniques ◽

Weight Of Evidence ◽

Validation Dataset ◽

Boosted Regression Tree ◽

Area Index ◽

Statistical Measures

Gully erosion is a form of natural disaster and one of the land loss mechanisms causing severe problems worldwide. This study aims to delineate the areas with the most severe gully erosion susceptibility (GES) using the machine learning techniques Random Forest (RF), Gradient Boosted Regression Tree (GBRT), Naïve Bayes Tree (NBT), and Tree Ensemble (TE). The gully inventory map (GIM) consists of 120 gullies. Of the 120 gullies, 84 gullies (70%) were used for training and 36 gullies (30%) were used to validate the models. Fourteen gully conditioning factors (GCFs) were used for GES modeling and the relationships between the GCFs and gully erosion was assessed using the weight-of-evidence (WofE) model. The GES maps were prepared using RF, GBRT, NBT, and TE and were validated using area under the receiver operating characteristic (AUROC) curve, the seed cell area index (SCAI) and five statistical measures including precision (PPV), false discovery rate (FDR), accuracy, mean absolute error (MAE), and root mean squared error (RMSE). Nearly 7% of the basin has high to very high susceptibility for gully erosion. Validation results proved the excellent ability of these models to predict the GES. Of the analyzed models, the RF (AUROC = 0.96, PPV = 1.00, FDR = 0.00, accuracy = 0.87, MAE = 0.11, RMSE = 0.19 for validation dataset) is accurate enough for modeling and better suited for GES modeling than the other models. Therefore, the RF model can be used to model the GES areas not only in this river basin but also in other areas with the same geo-environmental conditions.

Download Full-text

The role of red cell distribution width in the differential diagnosis of iron deficiency anemia and non-transfusion dependent thalassemia patients

Hematology Reports ◽

10.4081/hr.2018.7605 ◽

2018 ◽

Vol 10 (3) ◽

Cited By ~ 3

Author(s):

Pokpong Piriyakhuntorn ◽

Adisak Tantiworawit ◽

Thanawat Rattanathammethee ◽

Chatree Chai-Adisaksopha ◽

Ekarat Rattarittamrong ◽

...

Keyword(s):

Diagnostic Accuracy ◽

Predictive Value ◽

Area Under The Curve ◽

Microcytic Anemia ◽

Deficiency Anemia ◽

Operating Characteristics ◽

Training Set ◽

Distribution Width ◽

Validation Set ◽

Sensitivity Specificity

This study aims to find the cut-off value and diagnostic accuracy of the use of RDW as initial investigation in enabling the differentiation between IDA and NTDT patients. Patients with microcytic anemia were enrolled in the training set and used to plot a receiving operating characteristics (ROC) curve to obtain the cut-off value of RDW. A second set of patients were included in the validation set and used to analyze the diagnostic accuracy. We recruited 94 IDA and 64 NTDT patients into the training set. The area under the curve of the ROC in the training set was 0.803. The best cut-off value of RDW in the diagnosis of NTDT was 21.0% with a sensitivity and specificity of 81.3% and 55.3% respectively. In the validation set, there were 34 IDA and 58 NTDT patients using the cut-off value of >21.0% to validate. The sensitivity, specificity, positive predictive value and negative predictive value were 84.5%, 70.6%, 83.1% and 72.7% respectively. We can therefore conclude that RDW >21.0% is useful in differentiating between IDA and NTDT patients with high diagnostic accuracy

Download Full-text

Attended With and Head-Turning Sign can be clinical markers of cognitive impairment in older adults

International Psychogeriatrics ◽

10.1017/s1041610217001181 ◽

2017 ◽

Vol 29 (11) ◽

pp. 1763-1769 ◽

Cited By ~ 16

Author(s):

Pinar Soysal ◽

Cansu Usarel ◽

Gul Ispirli ◽

Ahmet Turan Isik

Keyword(s):

Older Adults ◽

Cognitive Impairment ◽

Positive Predictive Value ◽

Negative Predictive Value ◽

Predictive Value ◽

Cognitive Assessment ◽

Screening Methods ◽

Operating Characteristics ◽

Head Turning ◽

Sensitivity Specificity

ABSTRACTBackground:Comprehensive neurocognitive assessment may not be performed in clinical practice, as it takes too much time and requires special training. Development of easily applicable, time-saving, and cost effective screening methods has allowed identifying the individuals that require further evaluation. The aim of present study was to assess the diagnostic value of the Attended With (AW) and Head-Turning Sign (HTS) for screening cognitive impairment (CI).Methods:Comprehensive geriatric assessment was performed in 529 elderly outpatients, and the presence or absence of AW and HTS was investigated in them all.Results:Of the 529 patients, of whom the mean age was 75.67 ± 8.29 years, 126 patients were considered as CI (102 dementia, 24 mild CI). The patients with positive AW had significantly lower scores on Mini-Mental State Examination, Cognitive State Test, and Montreal Cognitive Assessment, and activities of daily living compared to AW (−) patients (p < 0.001). Similar significant findings were obtained in the patients with positive and negative HTS (p < 0.001). The sensitivity, specificity, positive predictive value, and negative predictive value of AW in detecting CI were 92%, 37%, 31.4%, and 93.7%, respectively. The sensitivity, specificity, positive predictive value, and negative predictive value of HTS were 80%, 64%, 41.8%, and 91.5%, respectively. The area under the receiver-operating characteristics curve was 0.90 for AW and 0.82 for HTS.Conclusion:AW and HTS are fast, simple, effective, and sensitive methods for detecting CI. Therefore, they can be used for older adults attending the primary care settings with memory loss. Those with positive AW or HTS can be referred to the relevant centers for detailed cognitive assessment.

Download Full-text

Predicting Alert Source Device using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1526.079920 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1-10

Keyword(s):

Neural Network ◽

Machine Learning ◽

Model Building ◽

Learning Algorithm ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Imbalanced Dataset ◽

Daunting Task

In a large distributed virtualized environment, predicting the alerting source from its text seems to be daunting task. This paper explores the option of using machine learning algorithm to solve this problem. Unfortunately, our training dataset is highly imbalanced. Where 96% of alerting data is reported by 24% of alerting sources. This is the expected dataset in any live distributed virtualized environment, where new version of device will have relatively less alert compared to older devices. Any classification effort with such imbalanced dataset present different set of challenges compared to binary classification. This type of skewed data distribution makes conventional machine learning less effective, especially while predicting the minority device type alerts. Our challenge is to build a robust model which can cope with this imbalanced dataset and achieves relative high level of prediction accuracy. This research work stared with traditional regression and classification algorithms using bag of words model. Then word2vec and doc2vec models are used to represent the words in vector formats, which preserve the sematic meaning of the sentence. With this alerting text with similar message will have same vector form representation. This vectorized alerting text is used with Logistic Regression for model building. This yields better accuracy, but the model is relatively complex and demand more computational resources. Finally, simple neural network is used for this multi-class text classification problem domain by using keras and tensorflow libraries. A simple two layered neural network yielded 99 % accuracy, even though our training dataset was not balanced. This paper goes through the qualitative evaluation of the different machine learning algorithms and their respective result. Finally, two layered deep learning algorithms is selected as final solution, since it takes relatively less resource and time with better accuracy values.

Download Full-text

Machine learning for identification of frailty in Canadian primary care practices

International Journal for Population Data Science ◽

10.23889/ijpds.v6i1.1650 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Sylvia Aponte-Hao ◽

Sabrina T. Wong ◽

Manpreet Thandi ◽

Paul Ronksley ◽

Kerry McBrien ◽

...

Keyword(s):

Machine Learning ◽

Primary Care ◽

Predictive Value ◽

Case Definition ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Sensitivity Analyses ◽

Screening Tests ◽

Training Dataset ◽

Total N

IntroductionFrailty is a medical syndrome, commonly affecting people aged 65 years and over and is characterized by a greater risk of adverse outcomes following illness or injury. Electronic medical records contain a large amount of longitudinal data that can be used for primary care research. Machine learning can fully utilize this wide breadth of data for the detection of diseases and syndromes. The creation of a frailty case definition using machine learning may facilitate early intervention, inform advanced screening tests, and allow for surveillance. ObjectivesThe objective of this study was to develop a validated case definition of frailty for the primary care context, using machine learning. MethodsPhysicians participating in the Canadian Primary Care Sentinel Surveillance Network across Canada were asked to retrospectively identify the level of frailty present in a sample of their own patients (total n = 5,466), collected from 2015-2019. Frailty levels were dichotomized using a cut-off of 5. Extracted features included previously prescribed medications, billing codes, and other routinely collected primary care data. We used eight supervised machine learning algorithms, with performance assessed using a hold-out test set. A balanced training dataset was also created by oversampling. Sensitivity analyses considered two alternative dichotomization cut-offs. Model performance was evaluated using area under the receiver-operating characteristic curve, F1, accuracy, sensitivity, specificity, negative predictive value and positive predictive value. ResultsThe prevalence of frailty within our sample was 18.4%. Of the eight models developed to identify frail patients, an XGBoost model achieved the highest sensitivity (78.14%) and specificity (74.41%). The balanced training dataset did not improve classification performance. Sensitivity analyses did not show improved performance for cut-offs other than 5. ConclusionSupervised machine learning was able to create well performing classification models for frailty. Future research is needed to assess frailty inter-rater reliability, and link multiple data sources for frailty identification.

Download Full-text

Flash Flood Susceptibility Modeling and Magnitude Index Using Machine Learning and Geohydrological Models: A Modified Hybrid Approach

Remote Sensing ◽

10.3390/rs12172695 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2695

Author(s):

Samy Elmahdy ◽

Tarig Ali ◽

Mohamed Mohamed

Keyword(s):

Machine Learning ◽

United Arab Emirates ◽

Flash Flood ◽

Hybrid Approach ◽

Regression Tree ◽

Susceptibility Mapping ◽

Operating Characteristics ◽

Boosted Regression Tree ◽

Susceptibility Modeling ◽

Ensemble Machine Learning

In an arid region, flash floods (FF), as a response to climate changes, are the most hazardous causing massive destruction and losses to farms, human lives and infrastructure. A first step towards securing lives and infrastructure is the susceptibility mapping and predicting of occurrence sites of FF. Several studies have been applied using an ensemble machine learning model (EMLM) but measuring FF magnitude using a hybrid approach that integrates machine learning (MCL) and geohydrological models have not been widely applied. This study aims to modify a hybrid approach by testing three machine learning models. These are boosted regression tree (BRT), classification and regression trees (CART), and naive Bayes tree (NBT) for FF susceptibility mapping at the northern part of the United Arab Emirates (NUAE). This is followed by applying a group of accuracy metrics (precision, recall and F1 score) and the receiving operating characteristics (ROC) curve. The result demonstrated that the BRT has the highest performance for FF susceptibility mapping followed by the CART and NBT. After that, the produced FF map using the BRT was then modified by dividing it into seven basins, and a set of new FF conditioning parameters namely alluvial plain width, basin gradient and mean slope for each basin was calculated for measuring FF magnitude. The results showed that the mountainous and narrower basins (e.g., RAK, Masafi, Fujairah, and Rol Dadnah) have the highest probability occurrence of FF and FF magnitude, while the wider alluvial plains (e.g., Al Dhaid) have the lowest probability occurrence of FF and FF magnitude. The proposed approach is an effective approach to improve the susceptibility mapping of FF, landslides, land subsidence, and groundwater potentiality obtained using ensemble machine learning, which is used widely in the literature.

Download Full-text

Gastric cancer detection using the serum pepsinogen test method

Tumori Journal ◽

10.1177/03008916211014961 ◽

2021 ◽

pp. 030089162110149

Author(s):

Dragan Trivanovic ◽

Stjepko Plestina ◽

Lorena Honovic ◽

Renata Dobrila-Dintinjana ◽

Jelena Vlasic Tanaskovic ◽

...

Keyword(s):

Gastric Cancer ◽

Early Detection ◽

Predictive Value ◽

Area Under The Curve ◽

Test Method ◽

Operating Characteristics ◽

Roc Curve Analysis ◽

Laboratory Assessment ◽

Pepsinogen I ◽

Sensitivity Specificity

Background: Gastric cancer (GC) is the eighth most common cause of cancer deaths in Croatia and one of the most common causes of cancer deaths worldwide. A reliable diagnostic tool for the early detection of GC is essential. Objective: We previously suggested a pepsinogen test method to reduce the mortality from GC by allowing early detection. Here, we report an updated analysis from a prospective single-center clinical study to evaluate the sensitivity and specificity of the pepsinogen test method and to determine whether this test can be used as a part of routine laboratory assessment of high-risk patients. Methods: We present mature data of the pepsinogen test method in the Croatian population after a median follow-up of 36 months. Statistical analyses were performed using a Mann-Whitney U test, multiple logistic regression, and receiver operating characteristics (ROC) to evaluate the predictive power of the assayed biomarkers. Results: Of the 116 patients, 25 patients had GC and 91 demonstrated a nonmalignant pathology based on tissue biopsy. Cutoff values were pepsinogen I ⩽70 and pepsinogen I/II ratio ⩽3.0. Using ROC curve analysis, the accuracy, sensitivity, specificity, positive predictive value, and negative predictive value were determined to be 87.22%, 78.12%, 90.10%, 71.43%, and 92.86%, respectively, for the diagnosis of GC. The area under the curve was 0.700 (95% confidence interval 0.57–0.83). Conclusion: Pepsinogen tests are valuable for screening a population in need of further diagnosis and could help to avoid unnecessary invasive endoscopic procedures.

Download Full-text

Pancreatic cystic lesions: comparison of preoperative cytological results by eus-fna with surgical pathologist diagnosis

Health & Research Journal ◽

10.12681/healthresj.19813 ◽

2016 ◽

Vol 2 (2) ◽

pp. 140

Author(s):

Maria Tsimperleniou ◽

Ioannis Karoumpalis ◽

Christina Marvaki ◽

Olga Kadda ◽

Dimitrios Exarchos ◽

...

Keyword(s):

Fine Needle Aspiration ◽

Predictive Value ◽

Needle Aspiration ◽

Pancreatic Cysts ◽

Cystic Lesions ◽

Operating Characteristics ◽

Fine Needle ◽

Duration Of Hospitalization ◽

Sensitivity Specificity ◽

Cystic Pancreatic Lesions

Introduction: The preoperative cytological examination of pancreatic cystic lesions with endoscopic ultrasound and fine needle aspiration biopsy [Endoscopic Ultrasonography (EUS)-Fine Needle Aspiration (FNA)] is of great importance for avoiding unnecessary surgery.Aim: The aim of the present study was to show the importance of EUS-FNA in patients with cystic pancreatic lesions by comparing its results with surgical pathology diagnosis, intending by the selection of appropriate patients for surgery, to reduce preoperative morbidity and mortality and long duration of hospitalization conditions which are responsible for hospital infections as well as public health costs.Material and Methods: This was a prospective observational study. The studied sample consisted of 40 patients with pancreatic cysts. For data collection a specific registration form was used; the demographic characteristics, imaging methods and their results, the symptoms, any previous episodes of pancreatitis, the visualization with EUS, cytological analysis of fluid of pancreatic cysts, and CEA levels and fluid amylase, whenever was possible, as well as the pathologist results of the resected lesions were recorded. Data analysis was performed with the Statistical Package for Social Sciences (SPSS).Results: The sample included 40 patients, 17 men (42.5%) and 23 women (57.5%). The overall operating characteristics of EUS-FNA for pancreatic lesions which were resected, were as follows: sensitivity specificity 81.8% 100.0%, positive predictive value of 100%, negative predictive value 66.0% and 86.7% diagnostic accuracy.Conclusions: The present study confirmed that the EUS-FNA is the method which has high accuracy to select the appropriate patients with pancreatic cystic lesions for therapeutic pancreatectomy.

Download Full-text