A new approach in adsorption modeling using random forest regression, Bayesian multiple linear regression, and multiple linear regression: 2,4-D adsorption by a green adsorbent

Bahareh Beigzadeh; Mehdi Bahrami; Mohammad Javad Amiri; Mohammad Reza Mahmoudi

doi:10.2166/wst.2020.440

A new approach in adsorption modeling using random forest regression, Bayesian multiple linear regression, and multiple linear regression: 2,4-D adsorption by a green adsorbent

Water Science & Technology ◽

10.2166/wst.2020.440 ◽

2020 ◽

Vol 82 (8) ◽

pp. 1586-1602

Author(s):

Bahareh Beigzadeh ◽

Mehdi Bahrami ◽

Mohammad Javad Amiri ◽

Mohammad Reza Mahmoudi

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Synthetic Wastewater ◽

Random Forest Regression ◽

Water Quality Prediction ◽

Adsorbent Dosage ◽

Linear Relationships ◽

Rice Husk Biochar ◽

Dichlorophenoxy Acetic Acid

Abstract The mathematical model's usage in water quality prediction has received more interest recently. In this research, the potential of random forest regression (RFR), Bayesian multiple linear regression (BMLR), and multiple linear regression (MLR) were examined to predict the amount of 2,4-dichlorophenoxy acetic acid (2,4-D) elimination by rice husk biochar from synthetic wastewater, using five input operating parameters including initial 2,4-D concentration, adsorbent dosage, pH, reaction time, and temperature. The equilibrium and kinetic adsorption data were fitted best to the Freundlich and pseudo-first-order models. The thermodynamic parameters also indicated the exothermic and spontaneous nature of adsorption. The modeling results indicated an R2 of 0.994, 0.992, and 0.945 and RMSE of 1.92, 6.17, and 2.10 for the relationship between the model-estimated and measured values of 2,4-D removal for RFR, BMLR, and MLR, respectively. Overall performances indicated more proficiency of RFR than the BMLR and MLR models due to its capability in capturing the non-linear relationships between input data and their associated removal capacities. The sensitivity analysis demonstrated that the 2,4-D adsorption process is more sensitive to initial 2,4-D concentration and adsorbent dosage. Thus, it is possible to permanently monitor waters more cost-effectively with the suggested model application.

Download Full-text

Forecasting primary delay recovery of high-speed railway using multiple linear regression, supporting vector machine, artificial neural network, and random forest regression

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2017-0642 ◽

2019 ◽

Vol 46 (5) ◽

pp. 353-363 ◽

Cited By ~ 6

Author(s):

Chaozhe Jiang ◽

Ping Huang ◽

Javad Lessan ◽

Liping Fu ◽

Chao Wen

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Prediction Accuracy ◽

High Speed ◽

Support Vector ◽

Random Forest Regression ◽

High Speed Railway ◽

Buffer Time ◽

Artificial Neural

Accurate prediction of recoverable train delay can support the train dispatchers’ decision-making with timetable rescheduling and improving service reliability. In this paper, we present the results of an effort aimed to develop primary delay recovery (PDR) predictor model using train operation records from Wuhan-Guangzhou (W-G) high-speed railway. To this end, we first identified the main variables that contribute to delay, including dwell buffer time, running buffer time, magnitude of primary delay time, and individual sections’ influence. Different models are applied and calibrated to predict the PDR. The validation results on test datasets indicate that the random forest regression (RFR) model outperforms the other three alternative models, namely, multiple linear regression (MLR), support vector machine (SVM), and artificial neural networks (ANN) regarding prediction accuracy measure. Specifically, the evaluation results show that when the prediction tolerance is less than 1 min, the RFR model can achieve up to 80.4% of prediction accuracy, while the accuracy level is 44.4%, 78.5%, and 78.5% for MLR, SVM, and ANN models, respectively.

Download Full-text

A comparison of random forest regression and multiple linear regression for prediction in neuroscience

Journal of Neuroscience Methods ◽

10.1016/j.jneumeth.2013.08.024 ◽

2013 ◽

Vol 220 (1) ◽

pp. 85-91 ◽

Cited By ~ 31

Author(s):

Paul F. Smith ◽

Siva Ganesh ◽

Ping Liu

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Random Forest Regression

Download Full-text

Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city

Environmental Pollution ◽

10.1016/j.envpol.2018.11.034 ◽

2019 ◽

Vol 245 ◽

pp. 746-753 ◽

Cited By ~ 30

Author(s):

Weiran Yuchi ◽

Enkhjargal Gombojav ◽

Buyantushig Boldbaatar ◽

Jargalsaikhan Galsuren ◽

Sarangerel Enkhmaa ◽

...

Keyword(s):

Particulate Matter ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Fine Particulate Matter ◽

Random Forest Regression ◽

Fine Particulate

Download Full-text

The association and discordance between glycated hemoglobin A1c and glycated albumin, assessed using a blend of multiple linear regression and random forest regression

Clinica Chimica Acta ◽

10.1016/j.cca.2020.03.019 ◽

2020 ◽

Vol 506 ◽

pp. 44-49

Author(s):

Yuping Zeng ◽

He He ◽

Jun Zhou ◽

Mei Zhang ◽

Hengjian Huang ◽

...

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Hemoglobin A1c ◽

Glycated Hemoglobin ◽

Glycated Albumin ◽

Random Forest Regression ◽

Glycated Hemoglobin A1c

Download Full-text

Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale

Hydrogeology Journal ◽

10.1007/s10040-018-1900-5 ◽

2018 ◽

Vol 27 (3) ◽

pp. 1081-1098 ◽

Cited By ~ 10

Author(s):

Issoufou Ouedraogo ◽

Pierre Defourny ◽

Marnik Vanclooster

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Nitrate Concentration ◽

African Continent ◽

Random Forest Regression

Download Full-text

UHPLC–MS/MS-Based Nontargeted Metabolomics Analysis Reveals Biomarkers Related to the Freshness of Chilled Chicken

Foods ◽

10.3390/foods9091326 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1326

Author(s):

Tao Zhang ◽

Shanshan Zhang ◽

Lan Chen ◽

Hao Ding ◽

Pengfei Wu ◽

...

Keyword(s):

Regression Analysis ◽

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Linear Regression Analysis ◽

Detection Methods ◽

Metabolic Profiles ◽

Stepwise Multiple Linear Regression ◽

Random Forest Regression ◽

Metabolic Biomarkers

To identify metabolic biomarkers related to the freshness of chilled chicken, ultra-high-performance liquid chromatography–mass spectrometry (UHPLC–MS/MS) was used to obtain profiles of the metabolites present in chilled chicken stored for different lengths of time. Random forest regression analysis and stepwise multiple linear regression were used to identify key metabolic biomarkers related to the freshness of chilled chicken. A total of 265 differential metabolites were identified during storage of chilled chicken. Of these various metabolites, 37 were selected as potential biomarkers by random forest regression analysis. Receiver operating characteristic (ROC) curve analysis indicated that the biomarkers identified using random forest regression analysis showed a strong correlation with the freshness of chilled chicken. Subsequently, stepwise multiple linear regression analysis based on the biomarkers identified by using random forest regression analysis identified indole-3-carboxaldehyde, uridine monophosphate, s-phenylmercapturic acid, gluconic acid, tyramine, and Serylphenylalanine as key metabolic biomarkers. In conclusion, our study characterized the metabolic profiles of chilled chicken stored for different lengths of time and identified six key metabolic biomarkers related to the freshness of chilled chicken. These findings can contribute to a better understanding of the changes in the metabolic profiles of chilled chicken during storage and provide a basis for the further development of novel detection methods for the freshness of chilled chicken.

Download Full-text

Study on the distribution pattern and influencing factors of shrinking cities in Northeast China based on the random forest model

Journal of Geography and Cartography ◽

10.24294/jgc.v3i1.1305 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Guanghua Yan ◽

Xi Chen ◽

Yun Zhang

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Distribution Pattern ◽

Influencing Factors ◽

Unemployment Rate ◽

Northeast China ◽

Regression Method ◽

Shrinking Cities ◽

Random Forest Regression

Based on the population change data of 2005-2009, 2010-2014, 2015-2019 and 2005-2019, the shrinking cities in Northeast China are determined to analyze their spatial distribution pattern. And the influencing factors and effects of Shrinking Cities in Northeast China are explored by using multiple linear regression method and random forest regression method. The results show that: 1) In space, the shrinking cities in Northeast China are mainly distributed in the “land edge” areas represented by Changbai Mountain, Sanjiang Plain, Xiaoxing’an Mountain and Daxing’an Mountain. In terms of time, the contraction center shows an obvious trend of moving northward, while the opposite expansion center shows a trend of moving southward, and the Shrinking Cities gather further; 2) in the study of influencing factors, the results of multiple linear regression and random forest regression show that socio-economic factors play a major role in the formation of shrinking cities; 3) the precision of random forest regression is higher than that of multiple linear regression. The results show that per capita GDP has the greatest impact on the contraction intensity, followed by the unemployment rate, science and education expenses and the average wage of on-the-job workers. Among the four influencing factors, only the unemployment rate promotes the contraction, and the other three influencing factors inhibit the formation of shrinking cities to various degrees.

Download Full-text

COMPARISON OF RANDOM FOREST AND MULTIPLE LINEAR REGRESSION TO MODEL THE MASS BALANCE OF BIOSOLIDS FROM A COMPLEX BIOSOLIDS MANAGEMENT AREA

Water Environment Research ◽

10.1002/wer.1668 ◽

2021 ◽

Author(s):

Thaís Bremm Pluth ◽

Dominic A. Brose

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Mass Balance ◽

Management Area

Download Full-text

Multiple linear regression and random forest to predict and map soil properties using data from portable X-ray fluorescence spectrometer (pXRF)

Ciência e Agrotecnologia ◽

10.1590/1413-70542017416010317 ◽

2017 ◽

Vol 41 (6) ◽

pp. 648-664 ◽

Cited By ~ 30

Author(s):

Sérgio Henrique Godinho Silva ◽

Anita Fernanda dos Santos Teixeira ◽

Michele Duarte de Menezes ◽

Luiz Roberto Guimarães Guilherme ◽

Fatima Maria de Souza Moreira ◽

...

Keyword(s):

Random Forest ◽

Linear Regression ◽

Soil Properties ◽

Multiple Linear Regression ◽

Low Cost ◽

High Accuracy ◽

Important Variable ◽

X Ray ◽

Element Contents ◽

Fluorescence Spectrometer

ABSTRACT Determination of soil properties helps in the correct management of soil fertility. The portable X-ray fluorescence spectrometer (pXRF) has been recently adopted to determine total chemical element contents in soils, allowing soil property inferences. However, these studies are still scarce in Brazil and other countries. The objectives of this work were to predict soil properties using pXRF data, comparing stepwise multiple linear regression (SMLR) and random forest (RF) methods, as well as mapping and validating soil properties. 120 soil samples were collected at three depths and submitted to laboratory analyses. pXRF was used in the samples and total element contents were determined. From pXRF data, SMLR and RF were used to predict soil laboratory results, reflecting soil properties, and the models were validated. The best method was used to spatialize soil properties. Using SMLR, models had high values of R² (≥0.8), however the highest accuracy was obtained in RF modeling. Exchangeable Ca, Al, Mg, potential and effective cation exchange capacity, soil organic matter, pH, and base saturation had adequate adjustment and accurate predictions with RF. Eight out of the 10 soil properties predicted by RF using pXRF data had CaO as the most important variable helping predictions, followed by P2O5, Zn and Cr. Maps generated using RF from pXRF data had high accuracy for six soil properties, reaching R2 up to 0.83. pXRF in association with RF can be used to predict soil properties with high accuracy at low cost and time, besides providing variables aiding digital soil mapping.

Download Full-text

Applying Random Forest Model Algorithm to GFR Estimation

10.21203/rs.3.rs-74843/v1 ◽

2020 ◽

Author(s):

Peijia Liu ◽

Dong Yang ◽

Shaomin Li ◽

Yutian Chong ◽

Wentao Hu ◽

...

Keyword(s):

Random Forest ◽

Kidney Disease ◽

Linear Regression ◽

Regression Model ◽

Regression Models ◽

Random Forest Regression ◽

Variable Model ◽

Data Set ◽

Development Data ◽

Better Than

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.

Download Full-text