High-resolution digital mapping of soil organic carbon in permafrost terrain using machine learning: a case study in a sub-Arctic peatland environment

Matthias B. Siewert

doi:10.5194/bg-15-1663-2018

High-resolution digital mapping of soil organic carbon in permafrost terrain using machine learning: a case study in a sub-Arctic peatland environment

Biogeosciences ◽

10.5194/bg-15-1663-2018 ◽

2018 ◽

Vol 15 (6) ◽

pp. 1663-1682 ◽

Cited By ~ 20

Author(s):

Matthias B. Siewert

Keyword(s):

Machine Learning ◽

High Resolution ◽

Random Forest ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Land Cover ◽

Support Vector ◽

Small Scale ◽

Permafrost Degradation ◽

Discontinuous Permafrost

Abstract. Soil organic carbon (SOC) stored in northern peatlands and permafrost-affected soils are key components in the global carbon cycle. This article quantifies SOC stocks in a sub-Arctic mountainous peatland environment in the discontinuous permafrost zone in Abisko, northern Sweden. Four machine-learning techniques are evaluated for SOC quantification: multiple linear regression, artificial neural networks, support vector machine and random forest. The random forest model performed best and was used to predict SOC for several depth increments at a spatial resolution of 1 m (1×1 m). A high-resolution (1 m) land cover classification generated for this study is the most relevant predictive variable. The landscape mean SOC storage (0–150 cm) is estimated to be 8.3 ± 8.0 kg C m−2 and the SOC stored in the top meter (0–100 cm) to be 7.7 ± 6.2 kg C m−2. The predictive modeling highlights the relative importance of wetland areas and in particular peat plateaus for the landscape's SOC storage. The total SOC was also predicted at reduced spatial resolutions of 2, 10, 30, 100, 250 and 1000 m and shows a significant drop in land cover class detail and a tendency to underestimate the SOC at resolutions >  30 m. This is associated with the occurrence of many small-scale wetlands forming local hot-spots of SOC storage that are omitted at coarse resolutions. Sharp transitions in SOC storage associated with land cover and permafrost distribution are the most challenging methodological aspect. However, in this study, at local, regional and circum-Arctic scales, the main factor limiting robust SOC mapping efforts is the scarcity of soil pedon data from across the entire environmental space. For the Abisko region, past SOC and permafrost dynamics indicate that most of the SOC is barely 2000 years old and very dynamic. Future research needs to investigate the geomorphic response of permafrost degradation and the fate of SOC across all landscape compartments in post-permafrost landscapes.

Download Full-text

High-resolution digital mapping of soil organic carbon in permafrost terrain using machine-learning: A case study in a sub-Arctic peatland environment

10.5194/bg-2017-323 ◽

2017 ◽

Cited By ~ 1

Author(s):

Matthias B. Siewert

Keyword(s):

Machine Learning ◽

High Resolution ◽

Random Forest ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Land Cover ◽

Support Vector ◽

Small Scale ◽

Permafrost Degradation ◽

Discontinuous Permafrost

Abstract. Soil organic carbon (SOC) stored in northern peatlands and permafrost affected soils are key components in the global carbon cycle. I quantify SOC stocks in a sub-arctic mountainous peatland environment in the discontinuous permafrost zone in Abisko, northern Sweden. Four machine-learning techniques are evaluated: multiple linear regression, artificial neural networks, support vector machine and random forest. The random forest approach performed best and was used to predict SOC for several depth increments at a spatial resolution of 2 ×2 m. A high-resolution (1 × 1 m) land cover classification generated for this study is the most relevant predictive variable. The landscape mean SOC storage (0–150 cm) is estimated to 7.9 ± 8.0 kg C m−2 and the SOC stored in the top meter (0–100 cm) to 7.0 ± 6.3 kg C m−2. The predictive modeling highlights the relative importance of wetland areas and in particular peat plateaus for the landscape SOC storage. A surprising large number of small scale wetland areas are mapped forming very local hot-spots of SOC storage. The results show that robust SOC predictions are possible with the available methods and very high-resolution remote sensing data. Strong environmental gradients associated with land cover and permafrost distribution are the most challenging methodological aspect. However, in this study, at local, regional and circum-Arctic scale the main factor limiting robust, high-resolution SOC mapping efforts is the scarcity of soil pedon data from across the entire environmental space. For the Absiko region, past SOC and permafrost dynamics indicate that most of the SOC is barely 2000 years old and very dynamic in wetland areas with permafrost related landforms. Future research needs to investigate the geomorphic response of permafrost degradation and the fate of SOC across all landscape compartments in post-permafrost landscapes.

Download Full-text

High-resolution digital mapping of soil organic carbon and soil total nitrogen using DEM derivatives, Sentinel-1 and Sentinel-2 data based on machine learning algorithms

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.138244 ◽

2020 ◽

Vol 729 ◽

pp. 138244 ◽

Cited By ~ 2

Author(s):

Tao Zhou ◽

Yajun Geng ◽

Jie Chen ◽

Jianjun Pan ◽

Dagmar Haase ◽

...

Keyword(s):

Machine Learning ◽

High Resolution ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Total Nitrogen ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Soil Total Nitrogen ◽

Digital Mapping ◽

Sentinel 2

Download Full-text

High-Resolution U-Net: Preserving Image Details for Cultivated Land Extraction

Sensors ◽

10.3390/s20154064 ◽

2020 ◽

Vol 20 (15) ◽

pp. 4064

Author(s):

Wenna Xu ◽

Xinping Deng ◽

Shanxin Guo ◽

Jinsong Chen ◽

Luyi Sun ◽

...

Keyword(s):

High Resolution ◽

Random Forest ◽

Land Cover ◽

Image Texture ◽

Support Vector ◽

Spectral Variation ◽

Cultivated Land ◽

Resource Monitoring ◽

K Nearest Neighbors ◽

Image Details

Accurate and efficient extraction of cultivated land data is of great significance for agricultural resource monitoring and national food security. Deep-learning-based classification of remote-sensing images overcomes the two difficulties of traditional learning methods (e.g., support vector machine (SVM), K-nearest neighbors (KNN), and random forest (RF)) when extracting the cultivated land: (1) the limited performance when extracting the same land-cover type with the high intra-class spectral variation, such as cultivated land with both vegetation and non-vegetation cover, and (2) the limited generalization ability for handling a large dataset to apply the model to different locations. However, the “pooling” process in most deep convolutional networks, which attempts to enlarge the sensing field of the kernel by involving the upscale process, leads to significant detail loss in the output, including the edges, gradients, and image texture details. To solve this problem, in this study we proposed a new end-to-end extraction algorithm, a high-resolution U-Net (HRU-Net), to preserve the image details by improving the skip connection structure and the loss function of the original U-Net. The proposed HRU-Net was tested in Xinjiang Province, China to extract the cultivated land from Landsat Thematic Mapper (TM) images. The result showed that the HRU-Net achieved better performance (Acc: 92.81%; kappa: 0.81; F1-score: 0.90) than the U-Net++ (Acc: 91.74%; kappa: 0.79; F1-score: 0.89), the original U-Net (Acc: 89.83%; kappa: 0.74; F1-score: 0.86), and the Random Forest model (Acc: 76.13%; kappa: 0.48; F1-score: 0.69). The robustness of the proposed model for the intra-class spectral variation and the accuracy of the edge details were also compared, and this showed that the HRU-Net obtained more accurate edge details and had less influence from the intra-class spectral variation. The model proposed in this study can be further applied to other land cover types that have more spectral diversity and require more details of extraction.

Download Full-text

Performance of three machine learning algorithms for predicting soil organic carbon in German agricultural soil

10.5194/soil-2021-107 ◽

2021 ◽

Author(s):

Ali Sakhaee ◽

Anika Gebauer ◽

Mareike Ließ ◽

Axel Don

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Agricultural Soil ◽

Learning Algorithms ◽

Model Performance ◽

Machine Learning Algorithms ◽

Support Vector ◽

Organic Soils ◽

The Impact

Abstract. Soil organic carbon (SOC), as the largest terrestrial carbon pool, has the potential to influence climate change and mitigation, and consequently SOC monitoring is important in the frameworks of different international treaties. There is therefore a need for high resolution SOC maps. Machine learning (ML) offers new opportunities to do this due to its capability for data mining of large datasets. The aim of this study, therefore, was to test three commonly used algorithms in digital soil mapping – random forest (RF), boosted regression trees (BRT) and support vector machine for regression (SVR) – on the first German Agricultural Soil Inventory to model agricultural topsoil SOC content. Nested cross-validation was implemented for model evaluation and parameter tuning. Moreover, grid search and differential evolution algorithm were applied to ensure that each algorithm was tuned and optimised suitably. The SOC content of the German Agricultural Soil Inventory was highly variable, ranging from 4 g kg−1 to 480 g kg−1. However, only 4 % of all soils contained more than 87 g kg−1 SOC and were considered organic or degraded organic soils. The results show that SVR provided the best performance with RMSE of 32 g kg−1 when the algorithms were trained on the full dataset. However, the average RMSE of all algorithms decreased by 34 % when mineral and organic soils were modeled separately, with the best result from SVR with RMSE of 21 g kg−1. Model performance is often limited by the size and quality of the available soil dataset for calibration and validation. Therefore, the impact of enlarging the training data was tested by including 1223 data points from the European Land Use/Land Cover Area Frame Survey for agricultural sites in Germany. The model performance was enhanced for maximum 1 % for mineral soils and 2 % for organic soils. Despite the capability of machine learning algorithms in general, and particularly SVR, in modelling SOC on a national scale, the study showed that the most important to improve the model performance was separate modelling of mineral and organic soils.

Download Full-text

Predicting and Mapping of Soil Organic Carbon Using Machine Learning Algorithms in Northern Iran

Remote Sensing ◽

10.3390/rs12142234 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2234 ◽

Cited By ~ 6

Author(s):

Mostafa Emadi ◽

Ruhollah Taghizadeh-Mehrjardi ◽

Ali Cherati ◽

Majid Danesh ◽

Amir Mosavi ◽

...

Keyword(s):

Machine Learning ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Composite Surface ◽

Auxiliary Data ◽

Extreme Gradient Boosting

Estimation of the soil organic carbon (SOC) content is of utmost importance in understanding the chemical, physical, and biological functions of the soil. This study proposes machine learning algorithms of support vector machines (SVM), artificial neural networks (ANN), regression tree, random forest (RF), extreme gradient boosting (XGBoost), and conventional deep neural network (DNN) for advancing prediction models of SOC. Models are trained with 1879 composite surface soil samples, and 105 auxiliary data as predictors. The genetic algorithm is used as a feature selection approach to identify effective variables. The results indicate that precipitation is the most important predictor driving 14.9% of SOC spatial variability followed by the normalized difference vegetation index (12.5%), day temperature index of moderate resolution imaging spectroradiometer (10.6%), multiresolution valley bottom flatness (8.7%) and land use (8.2%), respectively. Based on 10-fold cross-validation, the DNN model reported as a superior algorithm with the lowest prediction error and uncertainty. In terms of accuracy, DNN yielded a mean absolute error of 0.59%, a root mean squared error of 0.75%, a coefficient of determination of 0.65, and Lin’s concordance correlation coefficient of 0.83. The SOC content was the highest in udic soil moisture regime class with mean values of 3.71%, followed by the aquic (2.45%) and xeric (2.10%) classes, respectively. Soils in dense forestlands had the highest SOC contents, whereas soils of younger geological age and alluvial fans had lower SOC. The proposed DNN (hidden layers = 7, and size = 50) is a promising algorithm for handling large numbers of auxiliary data at a province-scale, and due to its flexible structure and the ability to extract more information from the auxiliary data surrounding the sampled observations, it had high accuracy for the prediction of the SOC base-line map and minimal uncertainty.

Download Full-text

Evaluation of Light Gradient Boosted Machine Learning Technique in Large Scale Land Use and Land Cover Classification

Environments ◽

10.3390/environments7100084 ◽

2020 ◽

Vol 7 (10) ◽

pp. 84

Author(s):

Dakota Aaron McCarty ◽

Hyun Woo Kim ◽

Hye Kyung Lee

Keyword(s):

Machine Learning ◽

Land Use ◽

Support Vector Machines ◽

Random Forest ◽

Land Cover ◽

Large Scale ◽

Machine Learning Techniques ◽

Support Vector ◽

Light Gradient ◽

Vector Machines

The ability to rapidly produce accurate land use and land cover maps regularly and consistently has been a growing initiative as they have increasingly become an important tool in the efforts to evaluate, monitor, and conserve Earth’s natural resources. Algorithms for supervised classification of satellite images constitute a necessary tool for the building of these maps and they have made it possible to establish remote sensing as the most reliable means of map generation. In this paper, we compare three machine learning techniques: Random Forest, Support Vector Machines, and Light Gradient Boosted Machine, using a 70/30 training/testing evaluation model. Our research evaluates the accuracy of Light Gradient Boosted Machine models against the more classic and trusted Random Forest and Support Vector Machines when it comes to classifying land use and land cover over large geographic areas. We found that the Light Gradient Booted model is marginally more accurate with a 0.01 and 0.059 increase in the overall accuracy compared to Support Vector and Random Forests, respectively, but also performed around 25% quicker on average.

Download Full-text

Supplementary material to "High-resolution digital mapping of soil organic carbon in permafrost terrain using machine-learning: A case study in a sub-Arctic peatland environment"

10.5194/bg-2017-323-supplement ◽

2017 ◽

Author(s):

Matthias B. Siewert

Keyword(s):

Machine Learning ◽

High Resolution ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Digital Mapping ◽

Supplementary Material

Download Full-text

Machine-learning-based quantitative estimation of soil organic carbon content by VIS/NIR spectroscopy

PeerJ ◽

10.7717/peerj.5714 ◽

2018 ◽

Vol 6 ◽

pp. e5714 ◽

Cited By ~ 11

Author(s):

Jianli Ding ◽

Aixia Yang ◽

Jingzhe Wang ◽

Vasit Sagan ◽

Danlin Yu

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Organic Carbon ◽

Soil Organic Carbon ◽

Nir Spectroscopy ◽

Machine Learning Algorithms ◽

Recursive Feature Elimination ◽

Support Vector ◽

Hyperion Data ◽

Testing Set

Soil organic carbon (SOC) is an important soil property that has profound impact on soil quality and plant growth. With 140 soil samples collected from Ebinur Lake Wetland National Nature Reserve, Xinjiang Uyghur Autonomous Region of China, this research evaluated the feasibility of visible/near infrared (VIS/NIR) spectroscopy data (350–2,500 nm) and simulated EO-1 Hyperion data to estimate SOC in arid wetland regions. Three machine learning algorithms including Ant Colony Optimization-interval Partial Least Squares (ACO-iPLS), Recursive Feature Elimination-Support Vector Machine (RF-SVM), and Random Forest (RF) were employed to select spectral features and further estimate SOC. Results indicated that the feature wavelengths pertaining to SOC were mainly within the ranges of 745–910 nm and 1,911–2,254 nm. The combination of RF-SVM and first derivative pre-processing produced the highest estimation accuracy with the optimal values of Rt (correlation coefficient of testing set), RMSEt and RPD of 0.91, 0.27% and 2.41, respectively. The simulated EO-1 Hyperion data combined with Support Vector Machine (SVM) based recursive feature elimination algorithm produced the most accurate estimate of SOC content. For the testing set, Rt was 0.79, RMSEt was 0.19%, and RPD was 1.61. This practice provides an efficient, low-cost approach with potentially high accuracy to estimate SOC contents and hence supports better management and protection strategies for desert wetland ecosystems.

Download Full-text