Estimation of Leaf Chlorophyll a, b and Carotenoid Contents and Their Ratios Using Hyperspectral Reflectance

Rei Sonobe; Hiroto Yamashita; Harumi Mihara; Akio Morita; Takashi Ikka

doi:10.3390/rs12193265

Estimation of Leaf Chlorophyll a, b and Carotenoid Contents and Their Ratios Using Hyperspectral Reflectance

Remote Sensing ◽

10.3390/rs12193265 ◽

2020 ◽

Vol 12 (19) ◽

pp. 3265

Author(s):

Rei Sonobe ◽

Hiroto Yamashita ◽

Harumi Mihara ◽

Akio Morita ◽

Takashi Ikka

Keyword(s):

Chlorophyll A ◽

High Performance ◽

Machine Learning Algorithms ◽

Ion Concentration ◽

Gradient Boosting ◽

Support Vector ◽

Culture Methods ◽

Chl A ◽

Stochastic Gradient Boosting ◽

Processing Techniques

Japanese horseradish (wasabi) grows in very specific conditions, and recent environmental climate changes have damaged wasabi production. In addition, the optimal culture methods are not well known, and it is becoming increasingly difficult for incipient farmers to cultivate it. Chlorophyll a, b and carotenoid contents, as well as their allocation, could be an adequate indicator in evaluating its production and environmental stress; thus, developing an in situ method to monitor photosynthetic pigments based on reflectance could be useful for agricultural management. Besides original reflectance (OR), five pre-processing techniques, namely, first derivative reflectance (FDR), continuum-removed (CR), de-trending (DT), multiplicative scatter correction (MSC), and standard normal variate transformation (SNV), were compared to assess the accuracy of the estimation. Furthermore, five machine learning algorithms—random forest (RF), support vector machine (SVM), kernel-based extreme learning machine (KELM), Cubist, and Stochastic Gradient Boosting (SGB)—were considered. To classify the samples under different pH or sulphur ion concentration conditions, the end of the red edge bands was effective for OR, FDR, DT, MSC, and SNV, while a green-peak band was effective for CR. Overall, KELM and Cubist showed high performance and incorporating pre-processing techniques was effective for obtaining estimated values with high accuracy. The best combinations were found to be DT–KELM for chl a (RPD = 1.511–5.17, RMSE = 1.23–3.62 μg cm−2) and chl a:b (RPD = 0.73–3.17, RMSE = 0.13–0.60); CR–KELM for chl b (RPD = 1.92–5.06, RMSE = 0.41–1.03 μg cm−2) and chl a:car (RPD = 1.31–3.23, RMSE = 0.26–0.50); SNV–Cubist for car (RPD = 1.63–3.32, RMSE = 0.31–1.89 μg cm−2); and DT–Cubist for chl:car (RPD = 1.53–3.96, RMSE = 0.27–0.74).

Download Full-text

A Comparative Study of Different Machine Learning Algorithms in Predicting the Content of Ilmenite in Titanium Placer

Applied Sciences ◽

10.3390/app10020635 ◽

2020 ◽

Vol 10 (2) ◽

pp. 635 ◽

Cited By ~ 5

Author(s):

Yingli LV ◽

Qui-Thao Le ◽

Hoang-Bac Bui ◽

Xuan-Nam Bui ◽

Hoang Nguyen ◽

...

Keyword(s):

Soft Computing ◽

Mean Squared Error ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Soft Computing Techniques ◽

Residuals Analysis ◽

Computing Models

In this study, the ilmenite content in beach placer sand was estimated using seven soft computing techniques, namely random forest (RF), artificial neural network (ANN), k-nearest neighbors (kNN), cubist, support vector machine (SVM), stochastic gradient boosting (SGB), and classification and regression tree (CART). The 405 beach placer borehole samples were collected from Southern Suoi Nhum deposit, Binh Thuan province, Vietnam, to test the feasibility of these soft computing techniques in estimating ilmenite content. Heavy mineral analysis indicated that valuable minerals in the placer sand are zircon, ilmenite, leucoxene, rutile, anatase, and monazite. In this study, five materials, namely rutile, anatase, leucoxene, zircon, and monazite, were used as the input variables to estimate ilmenite content based on the above mentioned soft computing models. Of the whole dataset, 325 samples were used to build the regarded soft computing models; 80 remaining samples were used for the models’ verification. Root-mean-squared error (RMSE), determination coefficient (R2), a simple ranking method, and residuals analysis technique were used as the statistical criteria for assessing the model performances. The numerical experiments revealed that soft computing techniques are capable of estimating the content of ilmenite with high accuracy. The residuals analysis also indicated that the SGB model was the most suitable for determining the ilmenite content in the context of this research.

Download Full-text

Analyzing Fake News Based on Machine Learning Algorithms

Intelligent Systems and Computer Technology - Advances in Parallel Computing ◽

10.3233/apc200146 ◽

2020 ◽

Author(s):

Pawar A B ◽

Jawale M A ◽

Kyatanavar D N

Keyword(s):

Language Processing ◽

Gradient Descent ◽

Human Life ◽

Stochastic Gradient ◽

Machine Learning Algorithms ◽

Stochastic Gradient Descent ◽

Gradient Boosting ◽

Support Vector ◽

Fake News ◽

Processing Techniques

Usages of Natural Language Processing techniques in the field of detection of fake news is analyzed in this research paper. Fake news are misleading concepts spread by invalid resources can provide damages to human-life, society. To carry out this analysis work, dataset obtained from web resource OpenSources.co is used which is mainly part of Signal Media. The document frequency terms as TF-IDF of bi-grams used in correlation with PCFG (Probabilistic Context Free Grammar) on a set of 11,000 documents extracted as news articles. This set tested on classification algorithms namely SVM (Support Vector Machines), Stochastic Gradient Descent, Bounded Decision Trees, Gradient Boosting algorithm with Random Forests. In experimental analysis, found that combination of Stochastic Gradient Descent with TF-IDF of bi-grams gives an accuracy of 77.2% in detecting fake contents, which observes with PCFGs having slight recalling defects

Download Full-text

Prediction of E. coli Concentrations in Agricultural Pond Waters: Application and Comparison of Machine Learning Algorithms

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.768650 ◽

2022 ◽

Vol 4 ◽

Author(s):

Matthew D. Stocker ◽

Yakov A. Pachepsky ◽

Robert L. Hill

Keyword(s):

Machine Learning ◽

Water Quality ◽

Quality Parameters ◽

Machine Learning Algorithms ◽

Water Quality Parameters ◽

Gradient Boosting ◽

Support Vector ◽

E Coli ◽

Stochastic Gradient Boosting ◽

Significant Difference

The microbial quality of irrigation water is an important issue as the use of contaminated waters has been linked to several foodborne outbreaks. To expedite microbial water quality determinations, many researchers estimate concentrations of the microbial contamination indicator Escherichia coli (E. coli) from the concentrations of physiochemical water quality parameters. However, these relationships are often non-linear and exhibit changes above or below certain threshold values. Machine learning (ML) algorithms have been shown to make accurate predictions in datasets with complex relationships. The purpose of this work was to evaluate several ML models for the prediction of E. coli in agricultural pond waters. Two ponds in Maryland were monitored from 2016 to 2018 during the irrigation season. E. coli concentrations along with 12 other water quality parameters were measured in water samples. The resulting datasets were used to predict E. coli using stochastic gradient boosting (SGB) machines, random forest (RF), support vector machines (SVM), and k-nearest neighbor (kNN) algorithms. The RF model provided the lowest RMSE value for predicted E. coli concentrations in both ponds in individual years and over consecutive years in almost all cases. For individual years, the RMSE of the predicted E. coli concentrations (log10 CFU 100 ml−1) ranged from 0.244 to 0.346 and 0.304 to 0.418 for Pond 1 and 2, respectively. For the 3-year datasets, these values were 0.334 and 0.381 for Pond 1 and 2, respectively. In most cases there was no significant difference (P > 0.05) between the RMSE of RF and other ML models when these RMSE were treated as statistics derived from 10-fold cross-validation performed with five repeats. Important E. coli predictors were turbidity, dissolved organic matter content, specific conductance, chlorophyll concentration, and temperature. Model predictive performance did not significantly differ when 5 predictors were used vs. 8 or 12, indicating that more tedious and costly measurements provide no substantial improvement in the predictive accuracy of the evaluated algorithms.

Download Full-text

A Generalized Method for Modeling the Adsorption of Heavy Metals with Machine Learning Algorithms

Water ◽

10.3390/w12123490 ◽

2020 ◽

Vol 12 (12) ◽

pp. 3490

Author(s):

Noor Hafsa ◽

Sayeed Rushd ◽

Mohammed Al-Yaari ◽

Muhammad Rahman

Keyword(s):

Machine Learning ◽

Heavy Metals ◽

Mean Squared Error ◽

Learning Algorithms ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting

Applications of machine learning algorithms (MLAs) to modeling the adsorption efficiencies of different heavy metals have been limited by the adsorbate–adsorbent pair and the selection of specific MLAs. In the current study, adsorption efficiencies of fourteen heavy metal–adsorbent (HM-AD) pairs were modeled with a variety of ML models such as support vector regression with polynomial and radial basis function kernels, random forest (RF), stochastic gradient boosting, and bayesian additive regression tree (BART). The wet experiment-based actual measurements were supplemented with synthetic data samples. The first batch of dry experiments was performed to model the removal efficiency of an HM with a specific AD. The ML modeling was then implemented on the whole dataset to develop a generalized model. A ten-fold cross-validation method was used for the model selection, while the comparative performance of the MLAs was evaluated with statistical metrics comprising Spearman’s rank correlation coefficient, coefficient of determination (R2), mean absolute error, and root-mean-squared-error. The regression tree methods, BART, and RF demonstrated the most robust and optimum performance with 0.96 ⫹ R2 ⫹ 0.99. The current study provides a generalized methodology to implement ML in modeling the efficiency of not only a specific adsorption process but also a group of comparable processes involving multiple HM-AD pairs.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Comparison of Ensemble Machine Learning Methods for Soil Erosion Pin Measurements

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10010042 ◽

2021 ◽

Vol 10 (1) ◽

pp. 42

Author(s):

Kieu Anh Nguyen ◽

Walter Chen ◽

Bor-Shiun Lin ◽

Uma Seeboonruang

Keyword(s):

Machine Learning ◽

Soil Erosion ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Support Vector ◽

Ensemble Machine Learning ◽

Boosting Method ◽

Bagging Method

Although machine learning has been extensively used in various fields, it has only recently been applied to soil erosion pin modeling. To improve upon previous methods of quantifying soil erosion based on erosion pin measurements, this study explored the possible application of ensemble machine learning algorithms to the Shihmen Reservoir watershed in northern Taiwan. Three categories of ensemble methods were considered in this study: (a) Bagging, (b) boosting, and (c) stacking. The bagging method in this study refers to bagged multivariate adaptive regression splines (bagged MARS) and random forest (RF), and the boosting method includes Cubist and gradient boosting machine (GBM). Finally, the stacking method is an ensemble method that uses a meta-model to combine the predictions of base models. This study used RF and GBM as the meta-models, decision tree, linear regression, artificial neural network, and support vector machine as the base models. The dataset used in this study was sampled using stratified random sampling to achieve a 70/30 split for the training and test data, and the process was repeated three times. The performance of six ensemble methods in three categories was analyzed based on the average of three attempts. It was found that GBM performed the best among the ensemble models with the lowest root-mean-square error (RMSE = 1.72 mm/year), the highest Nash-Sutcliffe efficiency (NSE = 0.54), and the highest index of agreement (d = 0.81). This result was confirmed by the spatial comparison of the absolute differences (errors) between model predictions and observations using GBM and RF in the study area. In summary, the results show that as a group, the bagging method and the boosting method performed equally well, and the stacking method was third for the erosion pin dataset considered in this study.

Download Full-text

Prediction and Analysis of Gold Prices using Ensemble Machine Learning Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.36028 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 4367-4374

Author(s):

Gudipally Chandrashakar

Keyword(s):

Machine Learning ◽

Time Series ◽

Time Series Data ◽

Gold Price ◽

Machine Learning Algorithms ◽

Series Data ◽

Gradient Boosting ◽

Support Vector ◽

Average Value ◽

Ensemble Machine Learning

In this article, we used historical time series data up to the current day gold price. In this study of predicting gold price, we consider few correlating factors like silver price, copper price, standard, and poor’s 500 value, dollar-rupee exchange rate, Dow Jones Industrial Average Value. Considering the prices of every correlating factor and gold price data where dates ranging from 2008 January to 2021 February. Few algorithms of machine learning are used to analyze the time-series data are Random Forest Regression, Support Vector Regressor, Linear Regressor, ExtraTrees Regressor and Gradient boosting Regression. While seeing the results the Extra Tree Regressor algorithm gives the predicted value of gold prices more accurately.

Download Full-text

Identifying Children at Readmission Risk: At-Admission Versus Traditional At-Discharge Readmission Prediction Model

Healthcare ◽

10.3390/healthcare9101334 ◽

2021 ◽

Vol 9 (10) ◽

pp. 1334

Author(s):

Hasan Symum ◽

José Zayas-Castro

Keyword(s):

Prediction Model ◽

Information Exchange ◽

High Risk Patient ◽

Machine Learning Algorithms ◽

Polynomial Kernel ◽

Gradient Boosting ◽

Support Vector ◽

Hospital Discharges ◽

Discharge Model ◽

Readmission Risk

The timing of 30-day pediatric readmissions is skewed with approximately 40% of the incidents occurring within the first week of hospital discharges. The skewed readmission time distribution coupled with delay in health information exchange among healthcare providers might offer a limited time to devise a comprehensive intervention plan. However, pediatric readmission studies are thus far limited to the development of the prediction model after hospital discharges. In this study, we proposed a novel pediatric readmission prediction model at the time of hospital admission which can improve the high-risk patient selection process. We also compared proposed models with the standard at-discharge readmission prediction model. Using the Hospital Cost and Utilization Project database, this prognostic study included pediatric hospital discharges in Florida from January 2016 through September 2017. Four machine learning algorithms—logistic regression with backward stepwise selection, decision tree, Support Vector machines (SVM) with the polynomial kernel, and Gradient Boosting—were developed for at-admission and at-discharge models using a recursive feature elimination technique with a repeated cross-validation process. The performance of the at-admission and at-discharge model was measured by the area under the curve. The performance of the at-admission model was comparable with the at-discharge model for all four algorithms. SVM with Polynomial Kernel algorithms outperformed all other algorithms for at-admission and at-discharge models. Important features associated with increased readmission risk varied widely across the type of prediction model and were mostly related to patients’ demographics, social determinates, clinical factors, and hospital characteristics. Proposed at-admission readmission risk decision support model could help hospitals and providers with additional time for intervention planning, particularly for those targeting social determinants of children’s overall health.

Download Full-text

MODIS-FIRMS and ground-truthing based wildfire likelihood mapping of Sikkim Himalaya using machine learning algorithms.

10.21203/rs.3.rs-750123/v1 ◽

2021 ◽

Author(s):

Polash Banerjee

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Tree Cover ◽

Anthropogenic Factors ◽

Gradient Boosting ◽

Support Vector ◽

Learning Methods ◽

Sikkim Himalaya ◽

Environmental Features ◽

Machine Learning Methods

Abstract Wildfires in limited extent and intensity can be a boon for the forest ecosystem. However, recent episodes of wildfires of 2019 in Australia and Brazil are sad reminders of their heavy ecological and economical costs. Understanding the role of environmental factors in the likelihood of wildfires in a spatial context would be instrumental in mitigating it. In this study, 14 environmental features encompassing meteorological, topographical, ecological, in situ and anthropogenic factors have been considered for preparing the wildfire likelihood map of Sikkim Himalaya. A comparative study on the efficiency of machine learning methods like Generalized Linear Model (GLM), Support Vector Machine (SVM), Random Forest (RF) and Gradient Boosting Model (GBM) has been performed to identify the best performing algorithm in wildfire prediction. The study indicates that all the machine learning methods are good at predicting wildfires. However, RF has outperformed, followed by GBM in the prediction. Also, environmental features like average temperature, average wind speed, proximity to roadways and tree cover percentage are the most important determinants of wildfires in Sikkim Himalaya. This study can be considered as a decision support tool for preparedness, efficient resource allocation and sensitization of people towards mitigation of wildfires in Sikkim.

Download Full-text

Machine learning as a successful approach for predicting complex spatio–temporal patterns in animal species abundance

Animal Biodiversity and Conservation ◽

10.32800/abc.2021.44.0289 ◽

2021 ◽

pp. 289-301

Author(s):

B. Martín ◽

J. González–Arias ◽

J. A. Vicente–Vírseda

Keyword(s):

Machine Learning ◽

Random Forest ◽

Animal Species ◽

Temporal Patterns ◽

Additive Models ◽

Gradient Boosting ◽

Support Vector ◽

Stochastic Gradient Boosting ◽

Extreme Gradient Boosting ◽

Spatio Temporal

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.

Download Full-text