Big Data as a Tool for Building a Predictive Model of Mill Roll Wear

Natalia Vasilyeva; Elmira Fedorova; Alexandr Kolesnikov

doi:10.3390/sym13050859

Big Data as a Tool for Building a Predictive Model of Mill Roll Wear

Symmetry ◽

10.3390/sym13050859 ◽

2021 ◽

Vol 13 (5) ◽

pp. 859

Author(s):

Natalia Vasilyeva ◽

Elmira Fedorova ◽

Alexandr Kolesnikov

Keyword(s):

Random Forest ◽

Predictive Model ◽

Mill Roll ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Large Array ◽

Operational Control ◽

Control Data ◽

Straight Line ◽

Roll Wear

Big data analysis is becoming a daily task for companies all over the world as well as for Russian companies. With advances in technology and reduced storage costs, companies today can collect and store large amounts of heterogeneous data. The important step of extracting knowledge and value from such data is a challenge that will ultimately be faced by all companies seeking to maintain their competitiveness and place in the market. An approach to the study of metallurgical processes using the analysis of a large array of operational control data is considered. Using the example of steel rolling production, the development of a predictive model based on processing a large array of operational control data is considered. The aim of the work is to develop a predictive model of rolling mill roll wear based on a large array of operational control data containing information about the time of filling and unloading of rolls, rolled assortment, roll material, and time during which the roll is in operation. Preliminary preparation of data for modeling was carried out, which includes the removal of outliers, uncharacteristic and random measurement results (misses), as well as data gaps. Correlation analysis of the data showed that the dimensions and grades of rolled steel sheets, as well as the material from which the rolls are made, have the greatest influence on the wear of rolling mill rolls. Based on the processing of a large array of operational control data, various predictive models of the technological process were designed. The adequacy of the models was assessed by the value of the mean square error (MSE), the coefficient of determination (R2), and the value of the Pearson correlation coefficient (R) between the calculated and experimental values of the mill roll wear. In addition, the adequacy of the models was assessed by the symmetry of the values predicted by the model relative to the straight line Ypredicted = Yactual. Linear models constructed using the least squares method and cross-validation turned out to be inadequate (the coefficient of determination R2 does not exceed 0.3) to the research object. The following regressions were built on the basis of the same operational control database: Linear Regression multivariate, Lasso multivariate, Ridge multivariate, and ElasticNet multivariate. However, these models also turned out to be inadequate to the object of the research. Testing these models for symmetry showed that, in all cases, there is an underestimation of the predicted values. Models using algorithm composition have also been built. The methods of random forest and gradient boosting are considered. Both methods were found to be adequate for the object of the research (for the random forest model, the coefficient of determination is R2 = 0.798; for the gradient boosting model, the coefficient of determination is R2 = 0.847). However, the gradient boosting algorithm is recognized as preferable thanks to its high accuracy compared with the random forest algorithm. Control data for symmetry in reference to the straight line Ypredicted = Yactual showed that, in the case of developing the random forest model, there is a tendency to underestimate the predicted values (the calculated values are located below the straight line). In the case of developing a gradient boosting model, the predicted values are located symmetrically regarding the straight line Ypredicted = Yactual. Therefore, the gradient boosting model is preferred. The predictive model of mill roll wear will allow rational use of rolls in terms of minimizing overall roll wear. Thus, the proposed model will make it possible to redistribute the existing work rolls between the stands in order to reduce the total wear of the rolls.

Download Full-text

Random forest and extreme gradient boosting algorithms for streamflow modeling using vessel features, and tree-rings

10.21203/rs.3.rs-303081/v1 ◽

2021 ◽

Author(s):

Hossein Sahour ◽

Vahid Gholami ◽

Javad Torkman ◽

Mehdi Vazifedan ◽

Sirwe Saeedi

Keyword(s):

Machine Learning ◽

Random Forest ◽

Tree Rings ◽

Test Site ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Growing Seasons ◽

Extreme Gradient Boosting ◽

Streamflow Modeling

Abstract Monitoring temporal variation of streamflow is necessary for many water resources management plans, yet, such practices are constrained by the absence or paucity of data in many rivers around the world. Using a permanent river in the north of Iran as a test site, a machine learning framework was proposed to model the streamflow data in the three periods of growing seasons based on tree-rings and vessel features of the Zelkova carpinifolia species. First, full-disc samples were taken from 30 trees near the river, and the samples went through preprocessing, cross-dating, standardization, and time series analysis. Two machine learning algorithms, namely random forest (RF) and extreme gradient boosting (XGB), were used to model the relationships between dendrochronology variables (tree-rings and vessel features in the three periods of growing seasons) and the corresponding streamflow rates. The performance of each model was evaluated using statistical coefficients (coefficient of determination (R-squared), Nash-Sutcliffe efficiency (NSE), and root-mean-square error (NRMSE)). Findings demonstrate that consideration should be given to the XGB model in streamflow modeling given its apparent enhanced performance (R-squared: 0.87; NSE: 0.81; and NRMSE: 0.43) over the RF model (R-squared: 0.82; NSE: 0.71; and NRMSE: 0.52). Further, the results showed that the models perform better in modeling the normal and low flows compared to extremely high flows. Finally, the tested models were used to reconstruct the temporal streamflow during the past decades (1970–1981).

Download Full-text

Investigating the use of random forest, gradient boosting machine, support vector machine and their ensemble applied to fault detection

10.26678/abcm.cobem2017.cob17-1600 ◽

2017 ◽

Author(s):

Luis Felipe Nogoseke ◽

Gabriel Herman Bernardim Andrade ◽

Marco Boaretto ◽

Leandro Coelho

Keyword(s):

Support Vector Machine ◽

Random Forest ◽

Fault Detection ◽

Gradient Boosting ◽

Support Vector ◽

Gradient Boosting Machine

Download Full-text

The quantitative structure-activity relationships between GABAA receptor and ligands based on binding interface characteristic

Current Computer - Aided Drug Design ◽

10.2174/1573409916666200724153240 ◽

2020 ◽

Vol 16 ◽

Author(s):

Shu Cheng ◽

Yanrui Ding

Keyword(s):

Gabaa Receptor ◽

Molecular Descriptors ◽

Qsar Model ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Receptor Interaction ◽

Gamma Aminobutyric Acid ◽

Quantitative Structure ◽

Binding Interface ◽

Structure Activity

Background: Quantitative Structure Activity Relationship (QSAR) methods based on machine learning play a vital role in predicting biological effect. Objective: Considering the characteristics of the binding interface between ligands and the inhibitory neurotransmitter Gamma Aminobutyric Acid A(GABAA) receptor, we built a QSAR model of ligands that bind to the human GABAA receptor. Method: After feature selection with Mean Decrease Impurity, we selected 53 from 1,286 docked ligand molecular descriptors. Three QSAR models are built using gradient boosting regression tree algorithm based on the different combinations of docked ligand molecular descriptors and ligand-receptor interaction characteristics. Results: The features of the optimal QSAR model contain both the docked ligand molecular descriptors and ligand-receptor interaction characteristics. The Leave-One-Out-Cross-Validation (Q2 LOO) of the optimal QSAR model is 0.8974, the Coefficient of Determination (R2) for the testing set is 0.9261, the Mean Square Error (MSE) is 0.1862. We also used this model to predict the pIC50 of two new ligands, the differences between the predicted and experimental pIC50 are -0.02 and 0.03 respectively. Conclusion : We found the BELm2, BELe2, MATS1m, X5v, Mor08v, and Mor29m are crucial features, which can help to build the QSAR model more accurately.

Download Full-text

Knee Muscle Force Estimating Model Using Machine Learning Approach

The Computer Journal ◽

10.1093/comjnl/bxaa160 ◽

2020 ◽

Author(s):

Anurag Sohane ◽

Ravinder Agarwal

Keyword(s):

Machine Learning ◽

Random Forest ◽

Muscle Force ◽

Vastus Lateralis ◽

Input Parameter ◽

Research Work ◽

Cost Effective ◽

Coefficient Of Determination ◽

Muscle Forces ◽

Knee Muscle

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.

Download Full-text

Estimation of modified expansive soil CBR with multivariate adaptive regression splines, random forest and gradient boosting machine

Innovative Infrastructure Solutions ◽

10.1007/s41062-021-00568-z ◽

2021 ◽

Vol 6 (4) ◽

Author(s):

Chijioke Christopher Ikeagwuani

Keyword(s):

Random Forest ◽

Expansive Soil ◽

Multivariate Adaptive Regression Splines ◽

Gradient Boosting ◽

Regression Splines ◽

Gradient Boosting Machine ◽

Adaptive Regression ◽

Adaptive Regression Splines

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

A Machine Learning Method for Predicting Vegetation Indices in China

Remote Sensing ◽

10.3390/rs13061147 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1147

Author(s):

Xiangqian Li ◽

Wenping Yuan ◽

Wenjie Dong

Keyword(s):

Machine Learning ◽

Growing Season ◽

Crop Growth ◽

Spatiotemporal Distribution ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Severe Drought ◽

Vegetation Growth ◽

Extreme Gradient Boosting ◽

Boosting Method

To forecast the terrestrial carbon cycle and monitor food security, vegetation growth must be accurately predicted; however, current process-based ecosystem and crop-growth models are limited in their effectiveness. This study developed a machine learning model using the extreme gradient boosting method to predict vegetation growth throughout the growing season in China from 2001 to 2018. The model used satellite-derived vegetation data for the first month of each growing season, CO2 concentration, and several meteorological factors as data sources for the explanatory variables. Results showed that the model could reproduce the spatiotemporal distribution of vegetation growth as represented by the satellite-derived normalized difference vegetation index (NDVI). The predictive error for the growing season NDVI was less than 5% for more than 98% of vegetated areas in China; the model represented seasonal variations in NDVI well. The coefficient of determination (R2) between the monthly observed and predicted NDVI was 0.83, and more than 69% of vegetated areas had an R2 > 0.8. The effectiveness of the model was examined for a severe drought year (2009), and results showed that the model could reproduce the spatiotemporal distribution of NDVI even under extreme conditions. This model provides an alternative method for predicting vegetation growth and has great potential for monitoring vegetation dynamics and crop growth.

Download Full-text

Prediction of Healing Performance of Autogenous Healing Concrete Using Machine Learning

Materials ◽

10.3390/ma14154068 ◽

2021 ◽

Vol 14 (15) ◽

pp. 4068

Author(s):

Xu Huang ◽

Mirna Wasouf ◽

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Machine Learning ◽

Search Algorithm ◽

Weather Conditions ◽

Prediction Performance ◽

Machine Learning Algorithms ◽

Coefficient Of Determination ◽

Gradient Boosting ◽

Support Vector ◽

Self Healing ◽

Artificial Neural Network Ann

Cracks typically develop in concrete due to shrinkage, loading actions, and weather conditions; and may occur anytime in its life span. Autogenous healing concrete is a type of self-healing concrete that can automatically heal cracks based on physical or chemical reactions in concrete matrix. It is imperative to investigate the healing performance that autogenous healing concrete possesses, to assess the extent of the cracking and to predict the extent of healing. In the research of self-healing concrete, testing the healing performance of concrete in a laboratory is costly, and a mass of instances may be needed to explore reliable concrete design. This study is thus the world’s first to establish six types of machine learning algorithms, which are capable of predicting the healing performance (HP) of self-healing concrete. These algorithms involve an artificial neural network (ANN), a k-nearest neighbours (kNN), a gradient boosting regression (GBR), a decision tree regression (DTR), a support vector regression (SVR) and a random forest (RF). Parameters of these algorithms are tuned utilising grid search algorithm (GSA) and genetic algorithm (GA). The prediction performance indicated by coefficient of determination (R2) and root mean square error (RMSE) measures of these algorithms are evaluated on the basis of 1417 data sets from the open literature. The results show that GSA-GBR performs higher prediction performance (R2GSA-GBR = 0.958) and stronger robustness (RMSEGSA-GBR = 0.202) than the other five types of algorithms employed to predict the healing performance of autogenous healing concrete. Therefore, reliable prediction accuracy of the healing performance and efficient assistance on the design of autogenous healing concrete can be achieved.

Download Full-text

Calibrations of Low-Cost Air Pollution Monitoring Sensors for CO, NO2, O3, and SO2

Sensors ◽

10.3390/s21010256 ◽

2021 ◽

Vol 21 (1) ◽

pp. 256

Author(s):

Pengfei Han ◽

Han Mei ◽

Di Liu ◽

Ning Zeng ◽

Xiao Tang ◽

...

Keyword(s):

Random Forest ◽

Linear Regression ◽

Short Term Memory ◽

Hot Spot ◽

Low Cost ◽

Field Evaluation ◽

Pollution Monitoring ◽

Coefficient Of Determination ◽

Air Pollution Monitoring ◽

National Monitoring

Pollutant gases, such as CO, NO2, O3, and SO2 affect human health, and low-cost sensors are an important complement to regulatory-grade instruments in pollutant monitoring. Previous studies focused on one or several species, while comprehensive assessments of multiple sensors remain limited. We conducted a 12-month field evaluation of four Alphasense sensors in Beijing and used single linear regression (SLR), multiple linear regression (MLR), random forest regressor (RFR), and neural network (long short-term memory (LSTM)) methods to calibrate and validate the measurements with nearby reference measurements from national monitoring stations. For performances, CO > O3 > NO2 > SO2 for the coefficient of determination (R2) and root mean square error (RMSE). The MLR did not increase the R2 after considering the temperature and relative humidity influences compared with the SLR (with R2 remaining at approximately 0.6 for O3 and 0.4 for NO2). However, the RFR and LSTM models significantly increased the O3, NO2, and SO2 performances, with the R2 increasing from 0.3–0.5 to >0.7 for O3 and NO2, and the RMSE decreasing from 20.4 to 13.2 ppb for NO2. For the SLR, there were relatively larger biases, while the LSTMs maintained a close mean relative bias of approximately zero (e.g., <5% for O3 and NO2), indicating that these sensors combined with the LSTMs are suitable for hot spot detection. We highlight that the performance of LSTM is better than that of random forest and linear methods. This study assessed four electrochemical air quality sensors and different calibration models, and the methodology and results can benefit assessments of other low-cost sensors.

Download Full-text

Improving the performance of a radio-frequency localization system in adverse outdoor applications

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-021-02001-6 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Marcelo N. de Sousa ◽

Ricardo Sant’Ana ◽

Rigel P. Fernandes ◽

Julio Cesar Duarte ◽

José A. Apolinário ◽

...

Keyword(s):

Random Forest ◽

Ray Tracing ◽

Real World ◽

Practical Implication ◽

Real Life ◽

Simulated Data ◽

Real Data ◽

Gradient Boosting ◽

Real World Data ◽

Localization Accuracy

AbstractIn outdoor RF localization systems, particularly where line of sight can not be guaranteed or where multipath effects are severe, information about the terrain may improve the position estimate’s performance. Given the difficulties in obtaining real data, a ray-tracing fingerprint is a viable option. Nevertheless, although presenting good simulation results, the performance of systems trained with simulated features only suffer degradation when employed to process real-life data. This work intends to improve the localization accuracy when using ray-tracing fingerprints and a few field data obtained from an adverse environment where a large number of measurements is not an option. We employ a machine learning (ML) algorithm to explore the multipath information. We selected algorithms random forest and gradient boosting; both considered efficient tools in the literature. In a strict simulation scenario (simulated data for training, validating, and testing), we obtained the same good results found in the literature (error around 2 m). In a real-world system (simulated data for training, real data for validating and testing), both ML algorithms resulted in a mean positioning error around 100 ,m. We have also obtained experimental results for noisy (artificially added Gaussian noise) and mismatched (with a null subset of) features. From the simulations carried out in this work, our study revealed that enhancing the ML model with a few real-world data improves localization’s overall performance. From the machine ML algorithms employed herein, we also observed that, under noisy conditions, the random forest algorithm achieved a slightly better result than the gradient boosting algorithm. However, they achieved similar results in a mismatch experiment. This work’s practical implication is that multipath information, once rejected in old localization techniques, now represents a significant source of information whenever we have prior knowledge to train the ML algorithm.

Download Full-text