A Comparative Study of Linear, Random Forest and AdaBoost Regressions for Modeling Non-Traditional Machining

G. Shanmugasundar; M. Vanitha; Robert Čep; Vikas Kumar; Kanak Kalita; M. Ramachandran

doi:10.3390/pr9112015

A Comparative Study of Linear, Random Forest and AdaBoost Regressions for Modeling Non-Traditional Machining

Processes ◽

10.3390/pr9112015 ◽

2021 ◽

Vol 9 (11) ◽

pp. 2015

Author(s):

G. Shanmugasundar ◽

M. Vanitha ◽

Robert Čep ◽

Vikas Kumar ◽

Kanak Kalita ◽

...

Keyword(s):

Random Forest ◽

Linear Regression ◽

Process Parameters ◽

Removal Rate ◽

Predictive Modelling ◽

Random Forest Regression ◽

Electro Discharge Machining ◽

Functional Knowledge ◽

Lower Material ◽

Significant Attention

Non-traditional machining (NTM) has gained significant attention in the last decade due to its ability to machine conventionally hard-to-machine materials. However, NTMs suffer from several disadvantages such as higher initial cost, lower material removal rate, more power consumption, etc. NTMs involve several process parameters, the appropriate tweaking of which is necessary to obtain economical and suitable results. However, the costly and time-consuming nature of the NTMs makes it a tedious and expensive task to manually investigate the appropriate process parameters. The NTM process parameters and responses are often not linearly related and thus, conventional statistical tools might not be enough to derive functional knowledge. Thus, in this paper, three popular machine learning (ML) methods (viz. linear regression, random forest regression and AdaBoost regression) are employed to develop predictive models for NTM processes. By considering two high-fidelity datasets from the literature on electro-discharge machining and wire electro-discharge machining, case studies are shown in the paper for the effectiveness of the ML methods. Linear regression is observed to be insufficient in accurately mapping the complex relationship between the process parameters and responses. Both random forest regression and AdaBoost regression are found to be suitable for predictive modelling of NTMs. However, AdaBoost regression is recommended as it is found to be insensitive to the number of regressors and thus is more readily deployable.

Download Full-text

Machine Learning-Based Predictive Modelling of Biodiesel Production—A Comparative Perspective

Energies ◽

10.3390/en14041122 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1122

Author(s):

Krishna Kumar Gupta ◽

Kanak Kalita ◽

Ranjan Kumar Ghadai ◽

Manickam Ramachandran ◽

Xiao-Zhi Gao

Keyword(s):

Machine Learning ◽

Fatty Acid ◽

Free Fatty Acid ◽

Random Forest ◽

Production Process ◽

Process Parameters ◽

Predictive Modelling ◽

Biodiesel Production ◽

Random Forest Regression ◽

Physical Experiments

Owing to the ever-growing impetus towards the development of eco-friendly and low carbon footprint energy solutions, biodiesel production and usage have been the subject of tremendous research efforts. The biodiesel production process is driven by several process parameters, which must be maintained at optimum levels to ensure high productivity. Since biodiesel productivity and quality are also dependent on the various raw materials involved in transesterification, physical experiments are necessary to make any estimation regarding them. However, a brute force approach of carrying out physical experiments until the optimal process parameters have been achieved will not succeed, due to a large number of process parameters and the underlying non-linear relation between the process parameters and responses. In this regard, a machine learning-based prediction approach is used in this paper to quantify the response features of the biodiesel production process as a function of the process parameters. Three powerful machine learning algorithms—linear regression, random forest regression and AdaBoost regression are comprehensively studied in this work. Furthermore, two separate examples—one involving biodiesel yield, the other regarding biodiesel free fatty acid conversion percentage—are illustrated. It is seen that both random forest regression and AdaBoost regression can achieve high accuracy in predictive modelling of biodiesel yield and free fatty acid conversion percentage. However, AdaBoost may be a more suitable approach for biodiesel production modelling, as it achieves the best accuracy amongst the tested algorithms. Moreover, AdaBoost can be more quickly deployed, as it was seen to be insensitive to number of regressors used.

Download Full-text

Multi-Objective Optimization of Wire Electro Discharge Machining (WEDM) Process Parameters Using Grey-Fuzzy Approach

Periodica Polytechnica Mechanical Engineering ◽

10.3311/ppme.12167 ◽

2018 ◽

Vol 63 (1) ◽

pp. 16-25 ◽

Cited By ~ 9

Author(s):

Partha Protim Das ◽

Sunny Diyaley ◽

Shankar Chakraborty ◽

Ranjan Kumar Ghadai

Keyword(s):

Process Parameters ◽

Removal Rate ◽

Machining Process ◽

Wire Tension ◽

Electro Discharge Machining ◽

Pulse On Time ◽

Pulse Off Time ◽

Wedm Process ◽

Fuzzy Logic Technique ◽

Off Time

Wire electro discharge machining (WEDM) is a versatile non-traditional machining process that is extensively in use to machine the components having intricate profiles and shapes. In WEDM, it is very important to select the optimal process parameters so as to enhance the machine performance. This paper emphasizes the selection of optimal parametric combination of WEDM process while machining on EN31 steel, using grey-fuzzy logic technique. Process parameters such as servo voltage, wire tension, pulse-on-time and pulse-off-time were considered while taking into account several multi-responses such as material removal rate (MRR) and surface roughness (SR). It was found that pulse-on-time of 115 µs, pulse-off-time of 35 µs, servo voltage of 40 V and wire tension of 5 kgf results in a larger value of grey fuzzy reasoning grade (GFRG) which tends to maximize MRR and improve SR. Finally, analysis of variance (ANOVA) is applied to check the influence of each process parameters in the estimation of GFRG.

Download Full-text

Applying Random Forest Model Algorithm to GFR Estimation

10.21203/rs.3.rs-74843/v1 ◽

2020 ◽

Author(s):

Peijia Liu ◽

Dong Yang ◽

Shaomin Li ◽

Yutian Chong ◽

Wentao Hu ◽

...

Keyword(s):

Random Forest ◽

Kidney Disease ◽

Linear Regression ◽

Regression Model ◽

Regression Models ◽

Random Forest Regression ◽

Variable Model ◽

Data Set ◽

Development Data ◽

Better Than

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.

Download Full-text

Evaluation of Evaporation from Water Reservoirs in Local Conditions at Czech Republic

Hydrology ◽

10.3390/hydrology8040153 ◽

2021 ◽

Vol 8 (4) ◽

pp. 153

Author(s):

Eva Melišová ◽

Adam Vizina ◽

Martin Hanel ◽

Petr Pavlík ◽

Petra Šuhájková

Keyword(s):

Czech Republic ◽

Random Forest ◽

Linear Regression ◽

Statistical Methods ◽

Stepwise Regression ◽

The Other ◽

Water Reservoirs ◽

Random Forest Regression ◽

Local Conditions ◽

The Difference

Evaporation is an important factor in the overall hydrological balance. It is usually derived as the difference between runoff, precipitation and the change in water storage in a catchment. The magnitude of actual evaporation is determined by the quantity of available water and heavily influenced by climatic and meteorological factors. Currently, there are statistical methods such as linear regression, random forest regression or machine learning methods to calculate evaporation. However, in order to derive these relationships, it is necessary to have observations of evaporation from evaporation stations. In the present study, the statistical methods of linear regression and random forest regression were used to calculate evaporation, with part of the models being designed manually and the other part using stepwise regression. Observed data from 24 evaporation stations and ERA5-Land climate reanalysis data were used to create the regression models. The proposed regression formulas were tested on 33 water reservoirs. The results show that manual regression is a more appropriate method for calculating evaporation than stepwise regression, with the caveat that it is more time consuming. The difference between linear and random forest regression is the variance of the data; random forest regression is better able to fit the observed data. On the other hand, the interpretation of the result for linear regression is simpler. The study introduced that the use of reanalyzed data, ERA5-Land products using the random forest regression method is suitable for the calculation of evaporation from water reservoirs in the conditions of the Czech Republic.

Download Full-text

Arbitrage Pricing Model Based on Factor Analysis-Random Forest Regression and its application

International Journal of Mathematical Models and Methods in Applied Sciences ◽

10.46300/9101.2020.14.15 ◽

2020 ◽

Vol 14 ◽

Keyword(s):

Random Forest ◽

Linear Regression ◽

Common Factor ◽

Linear Regression Method ◽

National Bureau ◽

Pricing Model ◽

Arbitrage Pricing ◽

Random Forest Regression ◽

The Common ◽

Factor Variable

Firstly, this paper establishes K-factor linear model and arbitrage pricing model (ATP) according to ‘the Asset Pricing Model-Arbitrage Pricing Theory’, Then from 2001 to 2017, the Statistical Yearbook of the National Bureau of Statistics collected 10 factors as the original factors such as gross national product, gross industrial product and gross tertiary industry product. After synthesis and simplification, three common factors are extracted to replace ten original factors.The first common factor variable is used to reflect the overall economic level of the country;The second common factor variable reflects a country's inflation rate;The third public factor variable reflects the total annual net export trade situation of the country. After the common factor is determined, the value of the common factor is calculated from the original data.Collect the annual return of 10 stocks for 17 years and do twice random forest regression,we get the arbitrage pricing model. Then, based on the same common factor data, another arbitrage pricing model is obtained by imitating the linear regression method of previous similar papers. By comparing the pricing error, we can find the pricing effect of the model obtained by random forest regression is better than that of the model obtained by linear regression.

Download Full-text

A Random Forest Machine Learning Approach for the Retrieval of Leaf Chlorophyll Content in Wheat

Remote Sensing ◽

10.3390/rs11080920 ◽

2019 ◽

Vol 11 (8) ◽

pp. 920 ◽

Cited By ~ 18

Author(s):

Syed Haleem Shah ◽

Yoseline Angel ◽

Rasmus Houborg ◽

Shawkat Ali ◽

Matthew F. McCabe

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Vegetation Indices ◽

Learning Approach ◽

Random Forest Regression ◽

Leaf Chlorophyll ◽

Machine Learning Approach ◽

Input Variables ◽

Non Destructive

Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (Chlt), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable Chlt was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400–2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured Chlt and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated Chlt with an RMSE of 5.49 µg·cm−2 and an R2 of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 µg·cm−2, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of Chlt from hyperspectral data.

Download Full-text

Applying Random Forest Model Algorithm to GFR estimation

10.21203/rs.3.rs-22422/v1 ◽

2020 ◽

Author(s):

Peijia Liu ◽

Dong Yang ◽

Shaomin Li ◽

Yutian Chong ◽

Ming Li ◽

...

Keyword(s):

Random Forest ◽

Kidney Disease ◽

Linear Regression ◽

Regression Model ◽

Regression Models ◽

Random Forest Regression ◽

Variable Model ◽

Data Set ◽

Development Data ◽

Better Than

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equation Methods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy, precision and root mean square error(RMSE). Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P < 0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P < 0.01, 19.08 vs 20.60, P < 0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P = 0.10, 0.8 vs 0.78, P = 0.19, respectively). Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.

Download Full-text

Effect of Process Parameters on Spark Energy and Material Removal Rate in Electro-Discharge Machining Process

Lecture Notes in Mechanical Engineering - Advances in Industrial and Production Engineering ◽

10.1007/978-981-33-4320-7_70 ◽

2021 ◽

pp. 789-800

Author(s):

Rudra Pratap Singh ◽

Ashish Pal ◽

Deepak Raghuvanshi

Keyword(s):

Process Parameters ◽

Material Removal Rate ◽

Material Removal ◽

Removal Rate ◽

Machining Process ◽

Electro Discharge Machining ◽

Spark Energy

Download Full-text

Effects of Process Parameters on Material Removal in Vibration-Assisted Polishing of Micro-Optic Mold

Micromachines ◽

10.3390/mi9070349 ◽

2018 ◽

Vol 9 (7) ◽

pp. 349 ◽

Cited By ~ 3

Author(s):

Jiang Guo ◽

Hirofumi Suzuki

Keyword(s):

Surface Roughness ◽

Tool Wear ◽

Removal Efficiency ◽

Process Parameters ◽

Material Removal ◽

Removal Rate ◽

Grain Sizes ◽

Tool Influence Function ◽

Polishing Pressure ◽

Lower Material

Process parameter conditions such as vibrating motion, abrasives, pressure and tool wear play an important role in vibration-assisted polishing of micro-optic molds as they strongly affect material removal efficiency and stability. This paper presents an analytical and experimental investigation on the effects of process parameters, aimed at clarifying interrelations between material removal and process parameters which affect polishing quantitatively. The material removal rate (MRR) and surface roughness which represent the polishing characteristics were examined under different vibrating motions, grain sizes of abrasives and polishing pressure. The effects of pressure and tool wear conditions on tool influence function were analyzed. The results showed that 2D vibrating motion generated better surface roughness with higher material removal efficiency while a smaller grain size of abrasives created better surface roughness but lower material removal efficiency. MRR gradually decreases with the increase of polishing pressure when it exceeds 345 kPa, and it was greatly affected by the wear of polisher when wear diameter on the polisher’s head exceeds 300 μm.

Download Full-text

Forecasting primary delay recovery of high-speed railway using multiple linear regression, supporting vector machine, artificial neural network, and random forest regression

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2017-0642 ◽

2019 ◽

Vol 46 (5) ◽

pp. 353-363 ◽

Cited By ~ 6

Author(s):

Chaozhe Jiang ◽

Ping Huang ◽

Javad Lessan ◽

Liping Fu ◽

Chao Wen

Keyword(s):

Random Forest ◽

Linear Regression ◽

Multiple Linear Regression ◽

Prediction Accuracy ◽

High Speed ◽

Support Vector ◽

Random Forest Regression ◽

High Speed Railway ◽

Buffer Time ◽

Artificial Neural

Accurate prediction of recoverable train delay can support the train dispatchers’ decision-making with timetable rescheduling and improving service reliability. In this paper, we present the results of an effort aimed to develop primary delay recovery (PDR) predictor model using train operation records from Wuhan-Guangzhou (W-G) high-speed railway. To this end, we first identified the main variables that contribute to delay, including dwell buffer time, running buffer time, magnitude of primary delay time, and individual sections’ influence. Different models are applied and calibrated to predict the PDR. The validation results on test datasets indicate that the random forest regression (RFR) model outperforms the other three alternative models, namely, multiple linear regression (MLR), support vector machine (SVM), and artificial neural networks (ANN) regarding prediction accuracy measure. Specifically, the evaluation results show that when the prediction tolerance is less than 1 min, the RFR model can achieve up to 80.4% of prediction accuracy, while the accuracy level is 44.4%, 78.5%, and 78.5% for MLR, SVM, and ANN models, respectively.

Download Full-text