Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra

Meihua Yang; Dongyun Xu; Songchao Chen; Hongyi Li; Zhou Shi

doi:10.3390/s19020263

Evaluation of Machine Learning Approaches to Predict Soil Organic Matter and pH Using vis-NIR Spectra

Sensors ◽

10.3390/s19020263 ◽

2019 ◽

Vol 19 (2) ◽

pp. 263 ◽

Cited By ~ 16

Author(s):

Meihua Yang ◽

Dongyun Xu ◽

Songchao Chen ◽

Hongyi Li ◽

Zhou Shi

Keyword(s):

Machine Learning ◽

Organic Matter ◽

Soil Organic Matter ◽

Least Squares ◽

Paddy Soil ◽

Prediction Accuracy ◽

Accurate Determination ◽

Support Vector ◽

Learning Approaches ◽

Lower Yangtze

Soil organic matter (SOM) and pH are essential soil fertility indictors of paddy soil in the middle-lower Yangtze Plain. Rapid, non-destructive and accurate determination of SOM and pH is vital to preventing soil degradation caused by inappropriate land management practices. Visible-near infrared (vis-NIR) spectroscopy with multivariate calibration can be used to effectively estimate soil properties. In this study, 523 soil samples were collected from paddy fields in the Yangtze Plain, China. Four machine learning approaches—partial least squares regression (PLSR), least squares-support vector machines (LS-SVM), extreme learning machines (ELM) and the Cubist regression model (Cubist)—were used to compare the prediction accuracy based on vis-NIR full bands and bands reduced using the genetic algorithm (GA). The coefficient of determination (R2), root mean square error (RMSE), and ratio of performance to inter-quartile distance (RPIQ) were used to assess the prediction accuracy. The ELM with GA reduced bands was the best model for SOM (SOM: R2 = 0.81, RMSE = 5.17, RPIQ = 2.87) and pH (R2 = 0.76, RMSE = 0.43, RPIQ = 2.15). The performance of the LS-SVM for pH prediction did not differ significantly between the model with GA (R2 = 0.75, RMSE = 0.44, RPIQ = 2.08) and without GA (R2 = 0.74, RMSE = 0.45, RPIQ = 2.07). Although a slight increase was observed when ELM were used for prediction of SOM and pH using reduced bands (SOM: R2 = 0.81, RMSE = 5.17, RPIQ = 2.87; pH: R2 = 0.76, RMSE = 0.43, RPIQ = 2.15) compared with full bands (R2 = 0.81, RMSE = 5.18, RPIQ = 2.83; pH: R2 = 0.76, RMSE = 0.45, RPIQ = 2.07), the number of wavelengths was greatly reduced (SOM: 201 to 44; pH: 201 to 32). Thus, the ELM coupled with reduced bands by GA is recommended for prediction of properties of paddy soil (SOM and pH) in the middle-lower Yangtze Plain.

Download Full-text

Abstract TP458: High Accuracy of Predictive Models for SAH Using Different Machine Learning Approaches

Stroke ◽

10.1161/str.51.suppl_1.tp458 ◽

2020 ◽

Vol 51 (Suppl_1) ◽

Author(s):

Paul Litvak ◽

Jeevan Medikonda ◽

Girish Menon ◽

Pitchaiah Mandava

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Prediction Accuracy ◽

Support Vector ◽

World Federation ◽

Learning Approaches ◽

Flow Models ◽

Multi Stage ◽

Stage 1 ◽

Categorical Scale

Background: Patients suffering from subarachnoid hemorrhage (SAH) have poor long-term outcomes. There are predictive models for ischemic and hemorrhagic stroke. However, there is paucity of models for SAH. Machine learning concepts were applied to build multi-stage Neural Networks (NN), Support Vector Machines (SVM) and Keras/Tensor Flow models to predict SAH outcomes. Methods: A database of ~800 aneurysmal SAH patients from Kasturba Medical College was utilized. Baseline variables of World Federation of Neurosurgeons 5-point scale (WFNS 1-5), age, gender, and presence/absence of hypertension and diabetes were considered in Stage 1. Stage 2 included all Stage 1 variables along with presence/absence of radiologic signs vasospasm and ischemia. Stage 3 includes earlier 2 stages and discharge Glasgow Outcome Scale (GOS 1-5). GOS at 3 months was predicted using 2-layer NN/SVM/Keras-TensorFlow models on the five point categorical scale as well as dichotomized to dead/alive and favorable (GOS 4-5) or unfavorable (GOS 1-3). Prediction accuracy of models was compared to the recorded GOS. Results: Prediction accuracy shown as percentages (See Table) for all three stages was similar for SVM, NN and Keras/TensorFlow models. Accuracy was remarkably higher with dichotomization compared to the complete five point GOS categorical scale. Conclusions: SVM, NN, and Keras-TensorFlow based machine learning models can be used to predict SAH outcomes to a high degree of accuracy. These powerful predictive models can be used to prognosticate and select patients into trials.

Download Full-text

Environmental and Anthropogenic Factors Driving Changes in Paddy Soil Organic Matter: A Case Study in the Middle and Lower Yangtze River Plain of China

Pedosphere ◽

10.1016/s1002-0160(17)60383-7 ◽

2017 ◽

Vol 27 (5) ◽

pp. 926-937 ◽

Cited By ~ 5

Author(s):

Naijia GUO ◽

Xuezheng SHI ◽

Yongcun ZHAO ◽

Shengxiang XU ◽

Meiyan WANG ◽

...

Keyword(s):

Organic Matter ◽

Soil Organic Matter ◽

Paddy Soil ◽

Yangtze River ◽

Anthropogenic Factors ◽

Lower Yangtze ◽

River Plain

Download Full-text

Ensemble Machine Learning Approaches for Proteogenomic Cancer Studies

10.21203/rs.3.rs-101902/v1 ◽

2020 ◽

Author(s):

Yulan Liang ◽

Amin Gharipour ◽

Erik Kelemen ◽

Arpad Kelemen

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Prediction Accuracy ◽

Performance Criteria ◽

Stable Set ◽

Support Vector ◽

Kappa Statistics ◽

Learning Approaches ◽

Ensemble Machine Learning ◽

Homogeneous Ensemble

Abstract Background: The identification of important proteins is critical for medical diagnosis and prognosis in common diseases. Diverse sets of computational tools were developed for omics data reductions and protein selections. However, standard statistical models with single feature selection involve the multi-testing burden of low power with the available limited samples. Furthermore, high correlations among proteins with high redundancy and moderate effects often lead to unstable selections and cause reproducibility issues. Ensemble feature selection in machine learning may identify a stable set of disease biomarkers that could improve the prediction performance of subsequent classification models, and thereby simplify their interpretability. In this study, we developed a three-stage homogeneous ensemble feature selection approach for both identifying proteins and improving prediction accuracy. This approach was implemented and applied to ovarian cancer proteogenomics data sets: 1) binary putative homologous recombination deficiency positive or negative; and 2) multiple mRNA classes (differentiated, proliferative, immunoreactive, mesenchymal, and unknown). We conducted and compared various machine learning approaches with homogeneous ensemble feature selection including random forest, support vector machine, and neural network for predicting both binary and multiple class outcomes. Various performance criteria including sensitivity, specificity, kappa statistics were used to assess the prediction consistency and accuracy. Results: With the proposed three-stage homogeneous ensemble feature selection approaches, prediction accuracy can be improved with the limited sample through continuously reducing errors and redundancy, i.e. Treebag provided 83% prediction accuracy (85% sensitivity and 81% specificity) for binary ovarian outcomes. For mRNA multi-classes classification, our approach provided even better accuracy with increased sample size. Conclusions: Despite the different prediction accuracies from various models, homogeneous ensemble feature selection proposed identified consistent sets of top ranked important markers out of 9606 proteins linked to the binary disease and multiple mRNA class outcomes.

Download Full-text

Ensemble Machine Learning Approach Improves Predicted Spatial Variation of Surface Soil Organic Carbon Stocks in Data-Limited Northern Circumpolar Region

Frontiers in Big Data ◽

10.3389/fdata.2020.528441 ◽

2020 ◽

Vol 3 ◽

Author(s):

Umakant Mishra ◽

Sagar Gautam ◽

William J. Riley ◽

Forrest M. Hoffman

Keyword(s):

Machine Learning ◽

Environmental Factors ◽

Soil Properties ◽

Spatial Variation ◽

Prediction Accuracy ◽

Gradient Boosting ◽

Support Vector ◽

Learning Approaches ◽

Regression Kriging ◽

Soc Stocks

Various approaches of differing mathematical complexities are being applied for spatial prediction of soil properties. Regression kriging is a widely used hybrid approach of spatial variation that combines correlation between soil properties and environmental factors with spatial autocorrelation between soil observations. In this study, we compared four machine learning approaches (gradient boosting machine, multinarrative adaptive regression spline, random forest, and support vector machine) with regression kriging to predict the spatial variation of surface (0–30 cm) soil organic carbon (SOC) stocks at 250-m spatial resolution across the northern circumpolar permafrost region. We combined 2,374 soil profile observations (calibration datasets) with georeferenced datasets of environmental factors (climate, topography, land cover, bedrock geology, and soil types) to predict the spatial variation of surface SOC stocks. We evaluated the prediction accuracy at randomly selected sites (validation datasets) across the study area. We found that different techniques inferred different numbers of environmental factors and their relative importance for prediction of SOC stocks. Regression kriging produced lower prediction errors in comparison to multinarrative adaptive regression spline and support vector machine, and comparable prediction accuracy to gradient boosting machine and random forest. However, the ensemble median prediction of SOC stocks obtained from all four machine learning techniques showed highest prediction accuracy. Although the use of different approaches in spatial prediction of soil properties will depend on the availability of soil and environmental datasets and computational resources, we conclude that the ensemble median prediction obtained from multiple machine learning approaches provides greater spatial details and produces the highest prediction accuracy. Thus an ensemble prediction approach can be a better choice than any single prediction technique for predicting the spatial variation of SOC stocks.

Download Full-text

Mapping Regional Soil Organic Matter Based on Sentinel-2A and MODIS Imagery Using Machine Learning Algorithms and Google Earth Engine

Remote Sensing ◽

10.3390/rs13152934 ◽

2021 ◽

Vol 13 (15) ◽

pp. 2934

Author(s):

Meiwei Zhang ◽

Meinan Zhang ◽

Haoxuan Yang ◽

Yuanliang Jin ◽

Xinle Zhang ◽

...

Keyword(s):

Machine Learning ◽

Organic Matter ◽

Soil Organic Matter ◽

Learning Algorithms ◽

Google Earth ◽

Machine Learning Algorithms ◽

Support Vector ◽

Full Band ◽

Google Earth Engine ◽

Sentinel 2A

Many studies have attempted to predict soil organic matter (SOM), whereas mapping high-precision and high-resolution SOM maps remains a challenge due to the difficulty of selecting appropriate satellite data sources and prediction algorithms. This study aimed to investigate the influence of different remotely sensed images and machine learning algorithms on SOM prediction. We constructed two comparative experiments, i.e., full-band and common-band variable datasets of Sentinel-2A and MODIS images using Google Earth Engine (GEE). The predictive performances of random forest (RF), artificial neural network (ANN), and support vector regression (SVR) algorithms were evaluated, and the SOM map was generated for the Songnen Plain. Results showed that the model based on the full-band Sentinel-2A dataset achieved the best performance. The application of Sentinel-2A data resulted in mean relative improvements (RIs) of 7.67% and 5.87%, respectively. The RF achieved a lower root mean squared error (RMSE = 0.68%) and a higher coefficient of determination (R2 = 0.67) in all of the predicted scenarios than ANN and SVR. The resultant SOM map accurately characterized the SOM spatial distribution. Therefore, the Sentinel-2A data have obvious advantages over MODIS due to their higher spectral and spatial resolutions, and the combination of the RF algorithm and GEE is an effective approach to SOM mapping.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text

Effect of Digestate and Straw Combined Application on Maintaining Rice Production and Paddy Environment

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18115714 ◽

2021 ◽

Vol 18 (11) ◽

pp. 5714

Author(s):

Xue Hu ◽

Hongyi Liu ◽

Chengyu Xu ◽

Xiaomin Huang ◽

Min Jiang ◽

...

Keyword(s):

Organic Matter ◽

Surface Water ◽

Soil Organic Matter ◽

Paddy Soil ◽

Rice Yield ◽

Soil Surface ◽

Rice Production ◽

Straw Decomposition ◽

Rice Grains ◽

Combined Application

Few studies have focused on the combined application of digestate and straw and its feasibility in rice production. Therefore, we conducted a two-year field experiment, including six treatments: without nutrients and straw (Control), digestate (D), digestate + fertilizer (DF), digestate + straw (DS), digestate + fertilizer + straw (DFS) and conventional fertilizer + straw (CS), to clarify the responses of rice growth and paddy soil nutrients to different straw and fertilizer combinations. Our results showed that digestate and straw combined application (i.e., treatment DFS) increased rice yield by 2.71 t ha−1 compared with the Control, and digestate combined with straw addition could distribute more nitrogen (N) to rice grains. Our results also showed that the straw decomposition rate at 0 cm depth under DS was 5% to 102% higher than that under CS. Activities of catalase, urease, sucrase and phosphatase at maturity under DS were all higher than that under both Control and CS. In addition, soil organic matter (SOM) and total nitrogen (TN) under DS and DFS were 20~26% and 11~12% higher than that under B and DF respectively, suggesting straw addition could benefit paddy soil quality. Moreover, coupling straw and digestate would contribute to decrease the N content in soil surface water. Overall, our results demonstrated that digestate and straw combined application could maintain rice production and have potential positive paddy environmental effects.

Download Full-text

Machine Learning Methods Applied to the Prediction of Pseudo-nitzschia spp. Blooms in the Galician Rias Baixas (NW Spain)

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040199 ◽

2021 ◽

Vol 10 (4) ◽

pp. 199

Author(s):

Francisco M. Bellas Aláez ◽

Jesus M. Torres Palenzuela ◽

Evangelos Spyrakos ◽

Luis González Vilas

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Prediction Models ◽

Support Vector ◽

False Alarms ◽

Learning Approaches ◽

Learning Methods ◽

Machine Learning Methods ◽

Rías Baixas ◽

New Algorithms

This work presents new prediction models based on recent developments in machine learning methods, such as Random Forest (RF) and AdaBoost, and compares them with more classical approaches, i.e., support vector machines (SVMs) and neural networks (NNs). The models predict Pseudo-nitzschia spp. blooms in the Galician Rias Baixas. This work builds on a previous study by the authors (doi.org/10.1016/j.pocean.2014.03.003) but uses an extended database (from 2002 to 2012) and new algorithms. Our results show that RF and AdaBoost provide better prediction results compared to SVMs and NNs, as they show improved performance metrics and a better balance between sensitivity and specificity. Classical machine learning approaches show higher sensitivities, but at a cost of lower specificity and higher percentages of false alarms (lower precision). These results seem to indicate a greater adaptation of new algorithms (RF and AdaBoost) to unbalanced datasets. Our models could be operationally implemented to establish a short-term prediction system.

Download Full-text

Machine Learning Approach for Predicting Lane-Change Maneuvers using the SHRP2 Naturalistic Driving Study Data

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211003581 ◽

2021 ◽

pp. 036119812110035

Author(s):

Anik Das ◽

Mohamed M. Ahmed

Keyword(s):

Machine Learning ◽

Prediction Accuracy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Lane Change ◽

Adaptive Boosting ◽

Extreme Gradient Boosting ◽

Naturalistic Driving Study ◽

Naturalistic Driving ◽

Change Prediction

Accurate lane-change prediction information in real time is essential to safely operate Autonomous Vehicles (AVs) on the roadways, especially at the early stage of AVs deployment, where there will be an interaction between AVs and human-driven vehicles. This study proposed reliable lane-change prediction models considering features from vehicle kinematics, machine vision, driver, and roadway geometric characteristics using the trajectory-level SHRP2 Naturalistic Driving Study and Roadway Information Database. Several machine learning algorithms were trained, validated, tested, and comparatively analyzed including, Classification And Regression Trees (CART), Random Forest (RF), eXtreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), Support Vector Machine (SVM), K Nearest Neighbor (KNN), and Naïve Bayes (NB) based on six different sets of features. In each feature set, relevant features were extracted through a wrapper-based algorithm named Boruta. The results showed that the XGBoost model outperformed all other models in relation to its highest overall prediction accuracy (97%) and F1-score (95.5%) considering all features. However, the highest overall prediction accuracy of 97.3% and F1-score of 95.9% were observed in the XGBoost model based on vehicle kinematics features. Moreover, it was found that XGBoost was the only model that achieved a reliable and balanced prediction performance across all six feature sets. Furthermore, a simplified XGBoost model was developed for each feature set considering the practical implementation of the model. The proposed prediction model could help in trajectory planning for AVs and could be used to develop more reliable advanced driver assistance systems (ADAS) in a cooperative connected and automated vehicle environment.

Download Full-text

Practical CO2—WAG Field Operational Designs Using Hybrid Numerical-Machine-Learning Approaches

Energies ◽

10.3390/en14041055 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1055

Author(s):

Qian Sun ◽

William Ampomah ◽

Junyu You ◽

Martha Cather ◽

Robert Balch

Keyword(s):

Machine Learning ◽

Oil Recovery ◽

History Matching ◽

Optimization Problems ◽

Learning Technologies ◽

Petroleum Engineering ◽

Support Vector ◽

Learning Approaches ◽

Field Development ◽

Proxy Models

Machine-learning technologies have exhibited robust competences in solving many petroleum engineering problems. The accurate predictivity and fast computational speed enable a large volume of time-consuming engineering processes such as history-matching and field development optimization. The Southwest Regional Partnership on Carbon Sequestration (SWP) project desires rigorous history-matching and multi-objective optimization processes, which fits the superiorities of the machine-learning approaches. Although the machine-learning proxy models are trained and validated before imposing to solve practical problems, the error margin would essentially introduce uncertainties to the results. In this paper, a hybrid numerical machine-learning workflow solving various optimization problems is presented. By coupling the expert machine-learning proxies with a global optimizer, the workflow successfully solves the history-matching and CO2 water alternative gas (WAG) design problem with low computational overheads. The history-matching work considers the heterogeneities of multiphase relative characteristics, and the CO2-WAG injection design takes multiple techno-economic objective functions into accounts. This work trained an expert response surface, a support vector machine, and a multi-layer neural network as proxy models to effectively learn the high-dimensional nonlinear data structure. The proposed workflow suggests revisiting the high-fidelity numerical simulator for validation purposes. The experience gained from this work would provide valuable guiding insights to similar CO2 enhanced oil recovery (EOR) projects.

Download Full-text