Soil Mapping Based on Globally Optimal Decision Trees and Digital Imitations of Traditional Approaches

Arseniy Zhogolev; Igor Savin

doi:10.3390/ijgi9110664

Soil Mapping Based on Globally Optimal Decision Trees and Digital Imitations of Traditional Approaches

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9110664 ◽

2020 ◽

Vol 9 (11) ◽

pp. 664

Author(s):

Arseniy Zhogolev ◽

Igor Savin

Keyword(s):

Random Forest ◽

Decision Trees ◽

Regional Scale ◽

Soil Formation ◽

Test Site ◽

Soil Mapping ◽

Optimal Decision ◽

Formation Factor ◽

Trend Prediction ◽

Soil Geography

Most digital soil mapping (DSM) approaches aim at complete statistical model extraction. The value of the explicit rules of soil delineation formulated by soil-mapping experts is often underestimated. These rules can be used for expert testing of the notional consistency of soil maps, soil trend prediction, soil geography investigations, and other applications. We propose an approach that imitates traditional soil mapping by constructing compact globally optimal decision trees (EVTREE) for the covariates of traditionally used soil formation factor maps. We evaluated our approach by regional-scale soil mapping at a test site in the Belgorod region of Russia. The notional consistency and compactness of the decision trees created by EVTREE were found to be suitable for expert-based analysis and improvement. With a large sample set, the accuracy of the predictions was slightly lower for EVTREE (59%) than for CART (67%) and much lower than for Random Forest (87%). With smaller sample sets of 1785 and 1000 points, EVTREE produced comparable or more accurate predictions and much more accurate models of soil geography than CART or Random Forest.

Download Full-text

Regional characterisation of soil properties by combining soil science and hyperspectral and thermal remote sensing: A technical overview of lab and field remote sensing methods

10.5194/egusphere-egu21-4818 ◽

2021 ◽

Author(s):

Richard Mommertz ◽

Lars Konen ◽

Martin Schodlok

Keyword(s):

Remote Sensing ◽

Soil Properties ◽

Regional Scale ◽

Arable Land ◽

Soil Mapping ◽

Soil Science ◽

Near Surface ◽

Soil Parameter ◽

The Impact ◽

Parameter Retrieval

Soil is one of the world&#8217;s most important natural resources for human livelihood as it provides food and clean water. Therefore, its preservation is of huge importance. For this purpose, a proficient regional database on soil properties is needed. The project &#8220;ReCharBo&#8221; (Regional Characterisation of Soil Properties) has the objective to combine remote sensing, geophysical and pedological methods to determine soil characteristics on a regional scale. Its aim is to characterise soils non-invasive, time and cost efficient and with a minimal number of soil samples to calibrate the measurements. Konen et al. (2021) give detailed information on the research concept and first field results in a presentation in the session &#8220;SSS10.3 Digital Soil Mapping and Assessment&#8221;. Hyperspectral remote sensing is a powerful and well known technique to characterise near surface soil properties. Depending on the sensor technology and the data quality, a wide variety of soil properties can be derived with remotely sensed data (Chabrillat et al. 2019, Stenberg et al. 2010). The project aims to investigate the effects of up and downscaling, namely which detail of information is preserved on a regional scale and how a change in scales affects the analysis algorithms and the possibility to retrieve valid soil parameter information. Thus, e.g. laboratory and field spectroscopy are applied to gain information of samples and fieldspots, respectively. Various UAV-based sensors, e.g. thermal & hyperspectral sensors, are applied to study soil properties of arable land in different study areas at field scale. Finally, airborne (helicopter) hyperspectral data will cover the regional scale. Additionally forthcoming spaceborne hyperspectral satellite data (e.g. Prisma, EnMAP, Sentinel-CHIME) are a promising outlook to gain detailed regional soil information. In this context it will be discussed how the multisensor data acquisition is best managed to optimise soil parameter retrieval. Sensor specific properties regarding time and date of acquisition as well as weather/atmospheric conditions are outlined. The presentation addresses and discusses the impact of a multisensor and multiscale remote sensing data collection regarding the results on soil parameter retrieval.&#160;ReferencesChabrillat, S., Ben-Dor, E. Cierniewski, J., Gomez, C., Schmid, T. & van Wesemael, B. (2019): Imaging Spectroscopy for Soil Mapping and Monitoring. Surveys in Geophysics 40:361&#8211;399. https://doi.org/10.1007/s10712-019-09524-0Stenberg, B., Viscarra Rossel, R. A., Mounem Mouazen, A. & Wetterlind, J. (2010): Visible and Near Infrared Spectroscopy in Soil Science. In: Donald L. Sparks (editor): Advances in Agronomy. Vol. 107. Academic Press:163-215. http://dx.doi.org/10.1016/S0065-2113(10)07005-7

Download Full-text

Does EO NDVI seasonal metrics capture variations in species composition and biomass due to grazing in semi-arid grassland savannas?

Biogeosciences ◽

10.5194/bg-12-4407-2015 ◽

2015 ◽

Vol 12 (14) ◽

pp. 4407-4419 ◽

Cited By ~ 13

Author(s):

J. L. Olsen ◽

S. Miehe ◽

P. Ceccato ◽

R. Fensholt

Keyword(s):

Time Series ◽

Species Composition ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Regional Scale ◽

Grazing Intensity ◽

Test Site ◽

Grazing Pressure ◽

Modis Ndvi ◽

Coarse Resolution

Abstract. Most regional scale studies of vegetation in the Sahel have been based on Earth observation (EO) imagery due to the limited number of sites providing continuous and long term in situ meteorological and vegetation measurements. From a long time series of coarse resolution normalized difference vegetation index (NDVI) data a greening of the Sahel since the 1980s has been identified. However, it is poorly understood how commonly applied remote sensing techniques reflect the influence of extensive grazing (and changes in grazing pressure) on natural rangeland vegetation. This paper analyses the time series of Moderate Resolution Imaging Spectroradiometer (MODIS) NDVI metrics by comparing it with data from the Widou Thiengoly test site in northern Senegal. Field data include grazing intensity, end of season standing biomass (ESSB) and species composition from sizeable areas suitable for comparison with moderate – coarse resolution satellite imagery. It is shown that sampling plots excluded from grazing have a different species composition characterized by a longer growth cycle as compared to plots under controlled grazing or communal grazing. Also substantially higher ESSB is observed for grazing exclosures as compared to grazed areas, substantially exceeding the amount of biomass expected to be ingested by livestock for this area. The seasonal integrated NDVI (NDVI small integral; capturing only the signal inherent to the growing season recurrent vegetation), derived using absolute thresholds to estimate start and end of growing seasons, is identified as the metric most strongly related to ESSB for all grazing regimes. However plot-pixel comparisons demonstrate how the NDVI/ESSB relationship changes due to grazing-induced variation in annual plant species composition and the NDVI values for grazed plots are only slightly lower than the values observed for the ungrazed plots. Hence, average ESSB in ungrazed plots since 2000 was 0.93 t ha−1, compared to 0.51 t ha−1 for plots subjected to controlled grazing and 0.49 t ha−1 for communally grazed plots, but the average integrated NDVI values for the same period were 1.56, 1.49, and 1.45 for ungrazed, controlled and communal, respectively, i.e. a much smaller difference. This indicates that a grazing-induced development towards less ESSB and shorter-cycled annual plants with reduced ability to turn additional water in wet years into biomass is not adequately captured by seasonal NDVI metrics.

Download Full-text

Fault diagnosis method of submersible screw pump based on random forest

PLoS ONE ◽

10.1371/journal.pone.0242458 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0242458

Author(s):

Minzheng Jiang ◽

Tiancai Cheng ◽

Kangxing Dong ◽

Shufan Xu ◽

Yulong Geng

Keyword(s):

Fault Diagnosis ◽

Random Forest ◽

Decision Trees ◽

Processing System ◽

Random Forest Model ◽

Categorical Variables ◽

Oil Well ◽

Screw Pump ◽

Forest Model ◽

Diagnosis Method

The difficulty in directly determining the failure mode of the submersible screw pump will shorten the life of the system and the normal production of the oil well. This thesis aims to identify the fault forms of submersible screw pump accurately and efficiently, and proposes a fault diagnosis method of the submersible screw pump based on random forest. HDFS storage system and MapReduce processing system are established based on Hadoop big data processing platform; Furthermore, the Bagging algorithm is used to collect the training set data. Also, this thesis adopts the CART method to establish the sample library and the decision trees for a random forest model. Six continuous variables, four categorical variables and fault categories of submersible screw pump oil production system are used for training the decision trees. As several decision trees constitute a random forest model, the parameters to be tested are input into the random forest models, and various types of decision trees are used to determine the failure category in the submersible screw pump. It has been verified that the accuracy rate of fault diagnosis is 92.86%. This thesis can provide some meaningful guidance for timely detection of the causes of downhole unit failures, reducing oil well production losses, and accelerating the promotion and application of submersible screw pumps in oil fields.

Download Full-text

Optimal Decision Trees on Simplicial Complexes

The Electronic Journal of Combinatorics ◽

10.37236/1900 ◽

2005 ◽

Vol 12 (1) ◽

Cited By ~ 1

Author(s):

Jakob Jonsson

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Simplicial Complex ◽

Elementary Theory ◽

Simplicial Complexes ◽

Optimal Decision ◽

Property A ◽

Recursive Definition ◽

Topological Combinatorics ◽

Definition Of

We consider topological aspects of decision trees on simplicial complexes, concentrating on how to use decision trees as a tool in topological combinatorics. By Robin Forman's discrete Morse theory, the number of evasive faces of a given dimension $i$ with respect to a decision tree on a simplicial complex is greater than or equal to the $i$th reduced Betti number (over any field) of the complex. Under certain favorable circumstances, a simplicial complex admits an "optimal" decision tree such that equality holds for each $i$; we may hence read off the homology directly from the tree. We provide a recursive definition of the class of semi-nonevasive simplicial complexes with this property. A certain generalization turns out to yield the class of semi-collapsible simplicial complexes that admit an optimal discrete Morse function in the analogous sense. In addition, we develop some elementary theory about semi-nonevasive and semi-collapsible complexes. Finally, we provide explicit optimal decision trees for several well-known simplicial complexes.

Download Full-text

Development of an ensemble machine learning prognostic model to predict 60-day risk of major adverse cardiac events in adults with chest pain

10.1101/2021.03.08.21252615 ◽

2021 ◽

Author(s):

Chris J. Kennedy ◽

Dustin G. Mark ◽

Jie Huang ◽

Mark J. van der Laan ◽

Alan E. Hubbard ◽

...

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Chest Pain ◽

Random Forest ◽

Decision Trees ◽

Low Risk ◽

Major Adverse Cardiac Events ◽

Risk Scores ◽

Cardiac Events ◽

Adverse Cardiac Events

Background: Chest pain is the second leading reason for emergency department (ED) visits and is commonly identified as a leading driver of low-value health care. Accurate identification of patients at low risk of major adverse cardiac events (MACE) is important to improve resource allocation and reduce over-treatment. Objectives: We sought to assess machine learning (ML) methods and electronic health record (EHR) covariate collection for MACE prediction. We aimed to maximize the pool of low-risk patients that are accurately predicted to have less than 0.5% MACE risk and may be eligible for reduced testing. Population Studied: 116,764 adult patients presenting with chest pain in the ED and evaluated for potential acute coronary syndrome (ACS). 60-day MACE rate was 1.9%. Methods: We evaluated ML algorithms (lasso, splines, random forest, extreme gradient boosting, Bayesian additive regression trees) and SuperLearner stacked ensembling. We tuned ML hyperparameters through nested ensembling, and imputed missing values with generalized low-rank models (GLRM). We benchmarked performance to key biomarkers, validated clinical risk scores, decision trees, and logistic regression. We explained the models through variable importance ranking and accumulated local effect visualization. Results: The best discrimination (area under the precision-recall [PR-AUC] and receiver operating characteristic [ROC-AUC] curves) was provided by SuperLearner ensembling (0.148, 0.867), followed by random forest (0.146, 0.862). Logistic regression (0.120, 0.842) and decision trees (0.094, 0.805) exhibited worse discrimination, as did risk scores [HEART (0.064, 0.765), EDACS (0.046, 0.733)] and biomarkers [serum troponin level (0.064, 0.708), electrocardiography (0.047, 0.686)]. The ensemble's risk estimates were miscalibrated by 0.2 percentage points. The ensemble accurately identified 50% of patients to be below a 0.5% 60-day MACE risk threshold. The most important predictors were age, peak troponin, HEART score, EDACS score, and electrocardiogram. GLRM imputation achieved 90% reduction in root mean-squared error compared to median-mode imputation. Conclusion: Use of ML algorithms, combined with broad predictor sets, improved MACE risk prediction compared to simpler alternatives, while providing calibrated predictions and interpretability. Standard risk scores may neglect important health information available in other characteristics and combined in nuanced ways via ML.

Download Full-text

Learning Optimal Decision Trees using Constraint Programming (Extended Abstract)

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/662 ◽

2020 ◽

Cited By ~ 2

Author(s):

Hélène Verhaeghe ◽

Siegfried Nijssen ◽

Gilles Pesant ◽

Claude-Guy Quimper ◽

Pierre Schaus

Keyword(s):

Decision Trees ◽

Constraint Programming ◽

Greedy Algorithms ◽

Building Blocks ◽

Optimal Decision ◽

Programming Approach ◽

Learning Problem ◽

New Approach ◽

Good Classification ◽

Additional Constraints

Decision trees are among the most popular classification models in machine learning. Traditionally, they are learned using greedy algorithms. However, such algorithms have their disadvantages: it is difficult to limit the size of the decision trees while maintaining a good classification accuracy, and it is hard to impose additional constraints on the models that are learned. For these reasons, there has been a recent interest in exact and flexible algorithms for learning decision trees. In this paper, we introduce a new approach to learn decision trees using constraint programming. Compared to earlier approaches, we show that our approach obtains better performance, while still being sufficiently flexible to allow for the inclusion of constraints. Our approach builds on three key building blocks: (1) the use of AND/OR search, (2) the use of caching, (3) the use of the CoverSize global constraint proposed recently for the problem of itemset mining. This allows our constraint programming approach to deal in a much more efficient way with the decompositions in the learning problem.

Download Full-text

PyDL8.5: a Library for Learning Optimal Decision Trees

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/750 ◽

2020 ◽

Author(s):

Gaël Aglin ◽

Siegfried Nijssen ◽

Pierre Schaus

Keyword(s):

Machine Learning ◽

Decision Trees ◽

Efficient Algorithm ◽

Optimal Decision ◽

Learning Tasks ◽

Explainable Ai ◽

Classification Tasks ◽

Interpretable Models ◽

Limited Depth

Decision Trees (DTs) are widely used Machine Learning (ML) models with a broad range of applications. The interest in these models has increased even further in the context of Explainable AI (XAI), as decision trees of limited depth are very interpretable models. However, traditional algorithms for learning DTs are heuristic in nature; they may produce trees that are of suboptimal quality under depth constraints. We introduce PyDL8.5, a Python library to infer depth-constrained Optimal Decision Trees (ODTs). PyDL8.5 provides an interface for DL8.5, an efficient algorithm for inferring depth-constrained ODTs. The library provides an easy-to-use scikit-learn compatible interface. It cannot only be used for classification tasks, but also for regression, clustering, and other tasks. We introduce an interface that allows users to easily implement these other learning tasks. We provide a number of examples of how to use this library.

Download Full-text

What drives forest fire in Fujian, China? Evidence from logistic regression and Random Forests

International Journal of Wildland Fire ◽

10.1071/wf15121 ◽

2016 ◽

Vol 25 (5) ◽

pp. 505 ◽

Cited By ~ 27

Author(s):

Futao Guo ◽

Guangyu Wang ◽

Zhangwen Su ◽

Huiling Liang ◽

Wenhui Wang ◽

...

Keyword(s):

Logistic Regression ◽

Random Forest ◽

Regional Scale ◽

Fire Risk ◽

Driving Factors ◽

Fire Season ◽

Fire Occurrence ◽

Climate Factors ◽

Local Factors ◽

Risk Zones

We applied logistic regression and Random Forest to evaluate drivers of fire occurrence on a provincial scale. Potential driving factors were divided into two groups according to scale of influence: ‘climate factors’, which operate on a regional scale, and ‘local factors’, which includes infrastructure, vegetation, topographic and socioeconomic data. The groups of factors were analysed separately and then significant factors from both groups were analysed together. Both models identified significant driving factors, which were ranked in terms of relative importance. Results show that climate factors are the main drivers of fire occurrence in the forests of Fujian, China. Particularly, sunshine hours, relative humidity (fire seasonal and daily), precipitation (fire season) and temperature (fire seasonal and daily) were seen to play a crucial role in fire ignition. Of the local factors, elevation, distance to railway and per capita GDP were found to be most significant. Random Forest demonstrated a higher predictive ability than logistic regression across all groups of factors (climate, local, and climate and local combined). Maps of the likelihood of fire occurrence in Fujian illustrate that the high fire-risk zones are distributed across administrative divisions; consequently, fire management strategies should be devised based on fire-risk zones, rather than on separate administrative divisions.

Download Full-text

Abstract 14955: Machine Learning Predicts Hemodynamic Instability in Children After Cardiac Surgery in Pediatric Intensive Care Unit (PICU)

Circulation ◽

10.1161/circ.142.suppl_3.14955 ◽

2020 ◽

Vol 142 (Suppl_3) ◽

Author(s):

Koichi Sughimoto ◽

Jacob Levman ◽

Fazleem Baig ◽

Derek Berger ◽

Yoshihiro Oshima ◽

...

Keyword(s):

Machine Learning ◽

Cardiac Surgery ◽

Random Forest ◽

Decision Trees ◽

Blood Lactate ◽

Heart Surgery ◽

Ground Truth ◽

Hemodynamic Instability ◽

Serum Lactate ◽

Lactate Levels

Introduction: Despite improvements in management for children after cardiac surgery, a non-negligible proportion of patients suffer from cardiac arrest, having a poor prognosis. Although serum lactate levels are widely accepted markers of hemodynamic instability, measuring lactate requires discrete blood sampling. An alternative method to evaluate hemodynamic stability/instability continuously and non-invasively may assist in improving the standard of patient care. Hypothesis: We hypothesize that blood lactate in PICU patients can be predicted using machine learning applied to arterial waveforms and perioperative characteristics. Methods: Forty-eight children, who underwent heart surgery, were included. Patient characteristics and physiological measurements were acquired and analyzed using specialized software/hardware, including heart rate, lactate level, arterial waveform sharpness, and area under the curve. Predicting a patient’s blood lactate levels was accomplished using regression-based supervised learning algorithms, including regression decision trees, tuned decision trees, random forest regressor, tuned random forest, AdaBoost regressor, and hypertuned AdaBoost. All algorithms were compared with hold-out cross validation. Two approaches were considered: basing prediction on the currently acquired physiological measurements along with those acquired at admission, as well as adding the most recent lactate measurement and the time since that measurement as prediction parameters. The second approach supports updating the learning system’s predictive capacity whenever a patient has a new ground truth blood lactate reading acquired. Results: In both approaches, the best performing machine learning method was the tuned random forest, which yielded a mean absolute error of 5.60 mg/dL in the first approach, and 4.62 mg/dL when predicting blood lactate with updated ground truth. Conclusions: In conclusion, the tuned random forest is capable of predicting the level of serum lactate by analyzing perioperative variables, including the arterial pressure waveform. Machine learning can predict the patient’s hemodynamics non-invasively, continuously, and with accuracy that may demonstrate clinical utility.

Download Full-text

Detecting Cognitive Distraction Using Random Forest by Considering Eye Movement Type

Intelligent Systems ◽

10.4018/978-1-5225-5643-5.ch069 ◽

2018 ◽

pp. 1587-1599

Author(s):

Hiroaki Koma ◽

Taku Harada ◽

Akira Yoshizawa ◽

Hirotoshi Iwasaki

Keyword(s):

Machine Learning ◽

Eye Movements ◽

Random Forest ◽

Decision Trees ◽

Eye Movement ◽

Learning Algorithm ◽

Machine Learning Algorithm ◽

Still Images ◽

Cognitive Distraction ◽

Movement Type

Detecting distracted states can be applied to various problems such as danger prevention when driving a car. A cognitive distracted state is one example of a distracted state. It is known that eye movements express cognitive distraction. Eye movements can be classified into several types. In this paper, the authors detect a cognitive distraction using classified eye movement types when applying the Random Forest machine learning algorithm, which uses decision trees. They show the effectiveness of considering eye movement types for detecting cognitive distraction when applying Random Forest. The authors use visual experiments with still images for the detection.

Download Full-text