Using Machine Learning for Estimating Rice Chlorophyll Content from In Situ Hyperspectral Data

Gangqiang An; Minfeng Xing; Binbin He; Chunhua Liao; Xiaodong Huang; Jiali Shang; Haiqi Kang

doi:10.3390/rs12183104

Using Machine Learning for Estimating Rice Chlorophyll Content from In Situ Hyperspectral Data

Remote Sensing ◽

10.3390/rs12183104 ◽

2020 ◽

Vol 12 (18) ◽

pp. 3104

Author(s):

Gangqiang An ◽

Minfeng Xing ◽

Binbin He ◽

Chunhua Liao ◽

Xiaodong Huang ◽

...

Keyword(s):

Machine Learning ◽

Chlorophyll Content ◽

Precision Agriculture ◽

Rate Of Change ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Training Set ◽

Validation Set

Chlorophyll is an essential pigment for photosynthesis in crops, and leaf chlorophyll content can be used as an indicator for crop growth status and help guide nitrogen fertilizer applications. Estimating crop chlorophyll content plays an important role in precision agriculture. In this study, a variable, rate of change in reflectance between wavelengths ‘a’ and ‘b’ (RCRWa-b), derived from in situ hyperspectral remote sensing data combined with four advanced machine learning techniques, Gaussian process regression (GPR), random forest regression (RFR), support vector regression (SVR), and gradient boosting regression tree (GBRT), were used to estimate the chlorophyll content (measured by a portable soil–plant analysis development meter) of rice. The performances of the four machine learning models were assessed and compared using root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2). The results revealed that four features of RCRWa-b, RCRW551.0–565.6, RCRW739.5–743.5, RCRW684.4–687.1 and RCRW667.9–672.0, were effective in estimating the chlorophyll content of rice, and the RFR model generated the highest prediction accuracy (training set: RMSE = 1.54, MAE =1.23 and R2 = 0.95; validation set: RMSE = 2.64, MAE = 1.99 and R2 = 0.80). The GPR model was found to have the strongest generalization (training set: RMSE = 2.83, MAE = 2.16 and R2 = 0.77; validation set: RMSE = 2.97, MAE = 2.30 and R2 = 0.76). We conclude that RCRWa-b is a useful variable to estimate chlorophyll content of rice, and RFR and GPR are powerful machine learning algorithms for estimating the chlorophyll content of rice.

Download Full-text

Estimating Crop Biophysical Parameters Using Machine Learning Algorithms and Sentinel-2 Imagery

Remote Sensing ◽

10.3390/rs13214314 ◽

2021 ◽

Vol 13 (21) ◽

pp. 4314

Author(s):

Mahlatse Kganyago ◽

Paidamwoyo Mhangara ◽

Clement Adjorlolo

Keyword(s):

Machine Learning ◽

Chlorophyll Content ◽

Precision Agriculture ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Biophysical Parameters ◽

Area Index ◽

Crop Monitoring ◽

Sentinel 2

Global food security is critical to eliminating hunger and malnutrition. In the changing climate, farmers in developing countries must adopt technologies and farming practices such as precision agriculture (PA). PA-based approaches enable farmers to cope with frequent and intensified droughts and heatwaves, optimising yields, increasing efficiencies, and reducing operational costs. Biophysical parameters such as Leaf Area Index (LAI), Leaf Chlorophyll Content (LCab), and Canopy Chlorophyll Content (CCC) are essential for characterising field-level spatial variability and thus are necessary for enabling variable rate application technologies, precision irrigation, and crop monitoring. Moreover, robust machine learning algorithms offer prospects for improving the estimation of biophysical parameters due to their capability to deal with non-linear data, small samples, and noisy variables. This study compared the predictive performance of sparse Partial Least Squares (sPLS), Random Forest (RF), and Gradient Boosting Machines (GBM) for estimating LAI, LCab, and CCC with Sentinel-2 imagery in Bothaville, South Africa and identified, using variable importance measures, the most influential bands for estimating crop biophysical parameters. The results showed that RF was superior in estimating all three biophysical parameters, followed by GBM which was better in estimating LAI and CCC, but not LCab, where sPLS was relatively better. Since all biophysical parameters could be achieved with RF, it can be considered a good contender for operationalisation. Overall, the findings in this study are significant for future biophysical product development using RF to reduce reliance on many algorithms for specific parameters, thus facilitating the rapid extraction of actionable information to support PA and crop monitoring activities.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Prediction and Analysis of Gold Prices using Ensemble Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36028 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4367-4374

Author(s):

Gudipally Chandrashakar

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Gold Price ◽

Machine Learning Algorithms ◽

Series Data ◽

Gradient Boosting ◽

Support Vector ◽

Average Value ◽

Ensemble Machine Learning

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Techniques for Detecting Malware Traffic: A Comprehensive Approach to Feature Selection and Classification

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39088 ◽

2021 ◽

Vol 9 (12) ◽

pp. 1-10

Author(s):

Harsha A K

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Malware Detection ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Steady Increase ◽

Extreme Gradient Boosting

Abstract: Since the advent of encryption, there has been a steady increase in malware being transmitted over encrypted networks. Traditional approaches to detect malware like packet content analysis are inefficient in dealing with encrypted data. In the absence of actual packet contents, we can make use of other features like packet size, arrival time, source and destination addresses and other such metadata to detect malware. Such information can be used to train machine learning classifiers in order to classify malicious and benign packets. In this paper, we offer an efficient malware detection approach using classification algorithms in machine learning such as support vector machine, random forest and extreme gradient boosting. We employ an extensive feature selection process to reduce the dimensionality of the chosen dataset. The dataset is then split into training and testing sets. Machine learning algorithms are trained using the training set. These models are then evaluated against the testing set in order to assess their respective performances. We further attempt to tune the hyper parameters of the algorithms, in order to achieve better results. Random forest and extreme gradient boosting algorithms performed exceptionally well in our experiments, resulting in area under the curve values of 0.9928 and 0.9998 respectively. Our work demonstrates that malware traffic can be effectively classified using conventional machine learning algorithms and also shows the importance of dimensionality reduction in such classification problems. Keywords: Malware Detection, Extreme Gradient Boosting, Random Forest, Feature Selection.

Download Full-text

Evaluating Variable Selection and Machine Learning Algorithms for Estimating Forest Heights by Combining Lidar and Hyperspectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090507 ◽

2020 ◽

Vol 9 (9) ◽

pp. 507

Author(s):

Sanjiwana Arjasakusuma ◽

Sandiaga Swahyu Kusuma ◽

Stuart Phinn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Principal Component ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forest Height ◽

Extreme Gradient Boosting

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.

Download Full-text

Combination of NIR spectroscopy and machine learning for monitoring chili sauce adulterated with ripened papaya

E3S Web of Conferences ◽

10.1051/e3sconf/202018704001 ◽

2020 ◽

Vol 187 ◽

pp. 04001

Author(s):

Ravipat Lapcharoensuk ◽

Kitticheat Danupattanin ◽

Chaowarin Kanjanapornprapa ◽

Tawin Inkawee

Keyword(s):

Machine Learning ◽

Food Industry ◽

Partial Least Squares Regression ◽

Nir Spectroscopy ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Least Squares Regression ◽

Validation Set ◽

Global Food

This research aimed to study the combination of NIR spectroscopy and machine learning for monitoring chilli sauce adulterated with papaya smoothie. The chilli sauce was produced by the famous community enterprise of chilli sauce processing in Thailand. The ingredients of the chilli sauce consisted of 45% chilli, 25% sugar, 20% garlic, 5% vinegar, and 5% salt. The chilli sauce sample was mixed with ripened papaya (Khaek Dam variety) smoothie with 9 levels from 10 to 90 %w/w. The NIR spectra of pure chilli sauce, papaya smoothie and 9 adulterated chilli sauce samples were recorded using FT-NIR spectrometer in the wavenumber range of 12500 and 4000 cm-1. Three machine learning algorithms were applied to develop a model for monitoring adulterated chilli sauce, including partial least squares regression (PLS), support vector machine (SVM), and backpropagation neural network (BPNN). All model presented performance of prediction in the validation set with R2al = 0.99 while RMSEP of PLS, SVM and BPNN were 1.71, 2.18 and 3.27% w/w respectively. This finding indicated that NIR spectroscopy coupled with machine learning approaches were shown to be an alternative technique to monitor papaya smoothie adulterated in chilli sauce in the global food industry.

Download Full-text

A Comparative Analysis of Enhanced Machine Learning Algorithms for Smart Grid Stability Prediction

10.36227/techrxiv.16863145.v1 ◽

2021 ◽

Author(s):

ANKIT GHOSH ◽

ALOK KOLE

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Smart Grid ◽

Smart Grids ◽

Machine Learning Algorithms ◽

Electricity Sector ◽

Stochastic Gradient Descent ◽

Energy Output ◽

Gradient Boosting ◽

Support Vector

<p>Smart grid is an essential concept in the transformation of the electricity sector into an intelligent digitalized energy network that can deliver optimal energy from the source to the consumers. Smart grids being self-sufficient systems are constructed through the integration of information, telecommunication, and advanced power technologies with the existing electricity systems. Artificial Intelligence (AI) is an important technology driver in smart grids. The application of AI techniques in smart grid is becoming more apparent because the traditional modelling optimization and control techniques have their own limitations. Machine Learning (ML) being a sub-set of AI enables intelligent decision-making and response to sudden changes in the customer energy demands, unexpected disruption of power supply, sudden variations in renewable energy output or any other catastrophic events in a smart grid. This paper presents the comparison among some of the state-of-the-art ML algorithms for predicting smart grid stability. The dataset that has been selected contains results from simulations of smart grid stability. Enhanced ML algorithms such as Support Vector Machine (SVM), Logistic Regression, K-Nearest Neighbour (KNN), Naïve Bayes (NB), Decision Tree (DT), Random Forest (RF), Stochastic Gradient Descent (SGD) classifier, XGBoost and Gradient Boosting classifiers have been implemented to forecast smart grid stability. A comparative analysis among the different ML models has been performed based on the following evaluation metrics such as accuracy, precision, recall, F1-score, AUC-ROC, and AUC-PR curves. The test results that have been obtained have been quite promising with the XGBoost classifier outperforming all the other models with an accuracy of 97.5%, recall of 98.4%, precision of 97.6%, F1-score of 97.9%, AUC-ROC of 99.8% and AUC-PR of 99.9%. </p>

Download Full-text

Predicting Forest Fires using Supervised and Ensemble Machine Learning Algorithms

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2878.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 3697-3705 ◽

Cited By ~ 1

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Forest Fires ◽

Principal Component ◽

Climatic Conditions ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Physical Factors

Forest fires have become one of the most frequently occurring disasters in recent years. The effects of forest fires have a lasting impact on the environment as it lead to deforestation and global warming, which is also one of its major cause of occurrence. Forest fires are dealt by collecting the satellite images of forest and if there is any emergency caused by the fires then the authorities are notified to mitigate its effects. By the time the authorities get to know about it, the fires would have already caused a lot of damage. Data mining and machine learning techniques can provide an efficient prevention approach where data associated with forests can be used for predicting the eventuality of forest fires. This paper uses the dataset present in the UCI machine learning repository which consists of physical factors and climatic conditions of the Montesinho park situated in Portugal. Various algorithms like Logistic regression, Support Vector Machine, Random forest, K-Nearest neighbors in addition to Bagging and Boosting predictors are used, both with and without Principal Component Analysis (PCA). Among the models in which PCA was applied, Logistic Regression gave the highest F-1 score of 68.26 and among the models where PCA was absent, Gradient boosting gave the highest score of 68.36.

Download Full-text