scholarly journals A Comparative Study of Linear, Random Forest and AdaBoost Regressions for Modeling Non-Traditional Machining

Processes ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 2015
Author(s):  
G. Shanmugasundar ◽  
M. Vanitha ◽  
Robert Čep ◽  
Vikas Kumar ◽  
Kanak Kalita ◽  
...  

Non-traditional machining (NTM) has gained significant attention in the last decade due to its ability to machine conventionally hard-to-machine materials. However, NTMs suffer from several disadvantages such as higher initial cost, lower material removal rate, more power consumption, etc. NTMs involve several process parameters, the appropriate tweaking of which is necessary to obtain economical and suitable results. However, the costly and time-consuming nature of the NTMs makes it a tedious and expensive task to manually investigate the appropriate process parameters. The NTM process parameters and responses are often not linearly related and thus, conventional statistical tools might not be enough to derive functional knowledge. Thus, in this paper, three popular machine learning (ML) methods (viz. linear regression, random forest regression and AdaBoost regression) are employed to develop predictive models for NTM processes. By considering two high-fidelity datasets from the literature on electro-discharge machining and wire electro-discharge machining, case studies are shown in the paper for the effectiveness of the ML methods. Linear regression is observed to be insufficient in accurately mapping the complex relationship between the process parameters and responses. Both random forest regression and AdaBoost regression are found to be suitable for predictive modelling of NTMs. However, AdaBoost regression is recommended as it is found to be insensitive to the number of regressors and thus is more readily deployable.

Energies ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 1122
Author(s):  
Krishna Kumar Gupta ◽  
Kanak Kalita ◽  
Ranjan Kumar Ghadai ◽  
Manickam Ramachandran ◽  
Xiao-Zhi Gao

Owing to the ever-growing impetus towards the development of eco-friendly and low carbon footprint energy solutions, biodiesel production and usage have been the subject of tremendous research efforts. The biodiesel production process is driven by several process parameters, which must be maintained at optimum levels to ensure high productivity. Since biodiesel productivity and quality are also dependent on the various raw materials involved in transesterification, physical experiments are necessary to make any estimation regarding them. However, a brute force approach of carrying out physical experiments until the optimal process parameters have been achieved will not succeed, due to a large number of process parameters and the underlying non-linear relation between the process parameters and responses. In this regard, a machine learning-based prediction approach is used in this paper to quantify the response features of the biodiesel production process as a function of the process parameters. Three powerful machine learning algorithms—linear regression, random forest regression and AdaBoost regression are comprehensively studied in this work. Furthermore, two separate examples—one involving biodiesel yield, the other regarding biodiesel free fatty acid conversion percentage—are illustrated. It is seen that both random forest regression and AdaBoost regression can achieve high accuracy in predictive modelling of biodiesel yield and free fatty acid conversion percentage. However, AdaBoost may be a more suitable approach for biodiesel production modelling, as it achieves the best accuracy amongst the tested algorithms. Moreover, AdaBoost can be more quickly deployed, as it was seen to be insensitive to number of regressors used.


2018 ◽  
Vol 63 (1) ◽  
pp. 16-25 ◽  
Author(s):  
Partha Protim Das ◽  
Sunny Diyaley ◽  
Shankar Chakraborty ◽  
Ranjan Kumar Ghadai

Wire electro discharge machining (WEDM) is a versatile non-traditional machining process that is extensively in use to machine the components having intricate profiles and shapes. In WEDM, it is very important to select the optimal process parameters so as to enhance the machine performance. This paper emphasizes the selection of optimal parametric combination of WEDM process while machining on EN31 steel, using grey-fuzzy logic technique. Process parameters such as servo voltage, wire tension, pulse-on-time and pulse-off-time were considered while taking into account several multi-responses such as material removal rate (MRR) and surface roughness (SR). It was found that pulse-on-time of 115 µs, pulse-off-time of 35 µs, servo voltage of 40 V and wire tension of 5 kgf results in a larger value of grey fuzzy reasoning grade (GFRG) which tends to maximize MRR and improve SR. Finally, analysis of variance (ANOVA) is applied to check the influence of each process parameters in the estimation of GFRG.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Wentao Hu ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equationMethods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy , precision and root mean square error(RMSE).Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P<0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P<0.01, 19.08 vs 20.60, P<0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P=0.10, 0.8 vs 0.78, P=0.19, respectively).Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


Hydrology ◽  
2021 ◽  
Vol 8 (4) ◽  
pp. 153
Author(s):  
Eva Melišová ◽  
Adam Vizina ◽  
Martin Hanel ◽  
Petr Pavlík ◽  
Petra Šuhájková

Evaporation is an important factor in the overall hydrological balance. It is usually derived as the difference between runoff, precipitation and the change in water storage in a catchment. The magnitude of actual evaporation is determined by the quantity of available water and heavily influenced by climatic and meteorological factors. Currently, there are statistical methods such as linear regression, random forest regression or machine learning methods to calculate evaporation. However, in order to derive these relationships, it is necessary to have observations of evaporation from evaporation stations. In the present study, the statistical methods of linear regression and random forest regression were used to calculate evaporation, with part of the models being designed manually and the other part using stepwise regression. Observed data from 24 evaporation stations and ERA5-Land climate reanalysis data were used to create the regression models. The proposed regression formulas were tested on 33 water reservoirs. The results show that manual regression is a more appropriate method for calculating evaporation than stepwise regression, with the caveat that it is more time consuming. The difference between linear and random forest regression is the variance of the data; random forest regression is better able to fit the observed data. On the other hand, the interpretation of the result for linear regression is simpler. The study introduced that the use of reanalyzed data, ERA5-Land products using the random forest regression method is suitable for the calculation of evaporation from water reservoirs in the conditions of the Czech Republic.


Firstly, this paper establishes K-factor linear model and arbitrage pricing model (ATP) according to ‘the Asset Pricing Model-Arbitrage Pricing Theory’, Then from 2001 to 2017, the Statistical Yearbook of the National Bureau of Statistics collected 10 factors as the original factors such as gross national product, gross industrial product and gross tertiary industry product. After synthesis and simplification, three common factors are extracted to replace ten original factors.The first common factor variable is used to reflect the overall economic level of the country;The second common factor variable reflects a country's inflation rate;The third public factor variable reflects the total annual net export trade situation of the country. After the common factor is determined, the value of the common factor is calculated from the original data.Collect the annual return of 10 stocks for 17 years and do twice random forest regression,we get the arbitrage pricing model. Then, based on the same common factor data, another arbitrage pricing model is obtained by imitating the linear regression method of previous similar papers. By comparing the pricing error, we can find the pricing effect of the model obtained by random forest regression is better than that of the model obtained by linear regression.


2019 ◽  
Vol 11 (8) ◽  
pp. 920 ◽  
Author(s):  
Syed Haleem Shah ◽  
Yoseline Angel ◽  
Rasmus Houborg ◽  
Shawkat Ali ◽  
Matthew F. McCabe

Developing rapid and non-destructive methods for chlorophyll estimation over large spatial areas is a topic of much interest, as it would provide an indirect measure of plant photosynthetic response, be useful in monitoring soil nitrogen content, and offer the capacity to assess vegetation structural and functional dynamics. Traditional methods of direct tissue analysis or the use of handheld meters, are not able to capture chlorophyll variability at anything beyond point scales, so are not particularly useful for informing decisions on plant health and status at the field scale. Examining the spectral response of plants via remote sensing has shown much promise as a means to capture variations in vegetation properties, while offering a non-destructive and scalable approach to monitoring. However, determining the optimum combination of spectra or spectral indices to inform plant response remains an active area of investigation. Here, we explore the use of a machine learning approach to enhance the estimation of leaf chlorophyll (Chlt), defined as the sum of chlorophyll a and b, from spectral reflectance data. Using an ASD FieldSpec 4 Hi-Res spectroradiometer, 2700 individual leaf hyperspectral reflectance measurements were acquired from wheat plants grown across a gradient of soil salinity and nutrient levels in a greenhouse experiment. The extractable Chlt was determined from laboratory analysis of 270 collocated samples, each composed of three leaf discs. A random forest regression algorithm was trained against these data, with input predictors based upon (1) reflectance values from 2102 bands across the 400–2500 nm spectral range; and (2) 45 established vegetation indices. As a benchmark, a standard univariate regression analysis was performed to model the relationship between measured Chlt and the selected vegetation indices. Results show that the root mean square error (RMSE) was significantly reduced when using the machine learning approach compared to standard linear regression. When exploiting the entire spectral range of individual bands as input variables, the random forest estimated Chlt with an RMSE of 5.49 µg·cm−2 and an R2 of 0.89. Model accuracy was improved when using vegetation indices as input variables, producing an RMSE ranging from 3.62 to 3.91 µg·cm−2, depending on the particular combination of indices selected. In further analysis, input predictors were ranked according to their importance level, and a step-wise reduction in the number of input features (from 45 down to 7) was performed. Implementing this resulted in no significant effect on the RMSE, and showed that much the same prediction accuracy could be obtained by a smaller subset of indices. Importantly, the random forest regression approach identified many important variables that were not good predictors according to their linear regression statistics. Overall, the research illustrates the promise in using established vegetation indices as input variables in a machine learning approach for the enhanced estimation of Chlt from hyperspectral data.


2020 ◽  
Author(s):  
Peijia Liu ◽  
Dong Yang ◽  
Shaomin Li ◽  
Yutian Chong ◽  
Ming Li ◽  
...  

Abstract Background The utilization of estimating-GFR equations is critical for kidney disease in the clinic. However, the performance of the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation has not improved substantially in the past eight years. Here we hypothesized that random forest regression(RF) method could go beyond revised linear regression, which is used to build the CKD-EPI equation Methods 1732 participants were enrolled in this study totally (1333 in development data set from Tianhe District and 399 in external data set Luogang District). Recursive feature elimination (RFE) is applied to the development data to select important variables and build random forest models. Then same variables were used to develop the estimated GFR equation with linear regression as a comparison. The performances of these equations are measured by bias, 30% accuracy, precision and root mean square error(RMSE). Results Of all the variables, creatinine, cystatin C, weight, body mass index (BMI), age, uric acid(UA), blood urea nitrogen(BUN), hematocrit(HCT) and apolipoprotein B(APOB) were selected by RFE method. The results revealed that the overall performance of random forest regression models ascended the revised regression models based on the same variables. In the 9-variable model, RF model was better than revised linear regression in term of bias, precision ,30%accuracy and RMSE(0.78 vs 2.98, 16.90 vs 23.62, 0.84 vs 0.80, 16.88 vs 18.70, all P < 0.01 ). In the 4-variable model, random forest regression model showed an improvement in precision and RMSE compared with revised regression model. (20.82 vs 25.25, P < 0.01, 19.08 vs 20.60, P < 0.001). Bias and 30%accurancy were preferable, but the results were not statistically significant (0.34 vs 2.07, P = 0.10, 0.8 vs 0.78, P = 0.19, respectively). Conclusions The performances of random forest regression models are better than revised linear regression models when it comes to GFR estimation.


Micromachines ◽  
2018 ◽  
Vol 9 (7) ◽  
pp. 349 ◽  
Author(s):  
Jiang Guo ◽  
Hirofumi Suzuki

Process parameter conditions such as vibrating motion, abrasives, pressure and tool wear play an important role in vibration-assisted polishing of micro-optic molds as they strongly affect material removal efficiency and stability. This paper presents an analytical and experimental investigation on the effects of process parameters, aimed at clarifying interrelations between material removal and process parameters which affect polishing quantitatively. The material removal rate (MRR) and surface roughness which represent the polishing characteristics were examined under different vibrating motions, grain sizes of abrasives and polishing pressure. The effects of pressure and tool wear conditions on tool influence function were analyzed. The results showed that 2D vibrating motion generated better surface roughness with higher material removal efficiency while a smaller grain size of abrasives created better surface roughness but lower material removal efficiency. MRR gradually decreases with the increase of polishing pressure when it exceeds 345 kPa, and it was greatly affected by the wear of polisher when wear diameter on the polisher’s head exceeds 300 μm.


2019 ◽  
Vol 46 (5) ◽  
pp. 353-363 ◽  
Author(s):  
Chaozhe Jiang ◽  
Ping Huang ◽  
Javad Lessan ◽  
Liping Fu ◽  
Chao Wen

Accurate prediction of recoverable train delay can support the train dispatchers’ decision-making with timetable rescheduling and improving service reliability. In this paper, we present the results of an effort aimed to develop primary delay recovery (PDR) predictor model using train operation records from Wuhan-Guangzhou (W-G) high-speed railway. To this end, we first identified the main variables that contribute to delay, including dwell buffer time, running buffer time, magnitude of primary delay time, and individual sections’ influence. Different models are applied and calibrated to predict the PDR. The validation results on test datasets indicate that the random forest regression (RFR) model outperforms the other three alternative models, namely, multiple linear regression (MLR), support vector machine (SVM), and artificial neural networks (ANN) regarding prediction accuracy measure. Specifically, the evaluation results show that when the prediction tolerance is less than 1 min, the RFR model can achieve up to 80.4% of prediction accuracy, while the accuracy level is 44.4%, 78.5%, and 78.5% for MLR, SVM, and ANN models, respectively.


Sign in / Sign up

Export Citation Format

Share Document