Prediction of Compressive Strength of Fly Ash Based Concrete Using Individual and Ensemble Algorithm

Ayaz Ahmad; Furqan Farooq; Pawel Niewiadomski; Krzysztof Ostrowski; Arslan Akbar; Fahid Aslam; Rayed Alyousef

doi:10.3390/ma14040794

Application of Novel Machine Learning Techniques for Predicting the Surface Chloride Concentration in Concrete Containing Waste Material

Materials ◽

10.3390/ma14092297 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2297

Author(s):

Ayaz Ahmad ◽

Furqan Farooq ◽

Krzysztof Adam Ostrowski ◽

Klaudia Śliwa-Wieczorek ◽

Slawomir Czarnecki

Keyword(s):

Machine Learning ◽

Mean Square Error ◽

Gene Expression Programming ◽

Chloride Concentration ◽

Absolute Error ◽

Chloride Ions ◽

Machine Learning Techniques ◽

Extensive Literature ◽

Mean Square ◽

Chloride Concentrations

Structures located on the coast are subjected to the long-term influence of chloride ions, which cause the corrosion of steel reinforcements in concrete elements. This corrosion severely affects the performance of the elements and may shorten the lifespan of an entire structure. Even though experimental activities in laboratories might be a solution, they may also be problematic due to time and costs. Thus, the application of individual machine learning (ML) techniques has been investigated to predict surface chloride concentrations (Cc) in marine structures. For this purpose, the values of Cc in tidal, splash, and submerged zones were collected from an extensive literature survey and incorporated into the article. Gene expression programming (GEP), the decision tree (DT), and an artificial neural network (ANN) were used to predict the surface chloride concentrations, and the most accurate algorithm was then selected. The GEP model was the most accurate when compared to ANN and DT, which was confirmed by the high accuracy level of the K-fold cross-validation and linear correlation coefficient (R2), mean absolute error (MAE), mean square error (MSE), and root mean square error (RMSE) parameters. As is shown in the article, the proposed method is an effective and accurate way to predict the surface chloride concentration without the inconveniences of laboratory tests.

Download Full-text

Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature

Geoscientific Model Development ◽

10.5194/gmd-7-1247-2014 ◽

2014 ◽

Vol 7 (3) ◽

pp. 1247-1250 ◽

Cited By ~ 1200

Author(s):

T. Chai ◽

R. R. Draxler

Keyword(s):

Root Mean Square Error ◽

Mean Square Error ◽

Root Mean Square ◽

Model Evaluation ◽

Mean Absolute Error ◽

Evaluation Studies ◽

Model Performance ◽

Absolute Error ◽

Average Error ◽

Mean Square

Abstract. Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error, and thus the MAE would be a better metric for that purpose. While some concerns over using RMSE raised by Willmott and Matsuura (2005) and Willmott et al. (2009) are valid, the proposed avoidance of RMSE in favor of MAE is not the solution. Citing the aforementioned papers, many researchers chose MAE over RMSE to present their model evaluation statistics when presenting or adding the RMSE measures could be more beneficial. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric, whereas Willmott et al. (2009) indicated that the sums-of-squares-based statistics do not satisfy this rule. In the end, we discussed some circumstances where using the RMSE will be more beneficial. However, we do not contend that the RMSE is superior over the MAE. Instead, a combination of metrics, including but certainly not limited to RMSEs and MAEs, are often required to assess model performance.

Download Full-text

Root mean square error (RMSE) or mean absolute error (MAE)?

Geoscientific Model Development Discussions ◽

10.5194/gmdd-7-1525-2014 ◽

2014 ◽

Vol 7 (1) ◽

pp. 1525-1534 ◽

Cited By ~ 93

Author(s):

T. Chai ◽

R. R. Draxler

Keyword(s):

Root Mean Square Error ◽

Mean Square Error ◽

Root Mean Square ◽

Model Evaluation ◽

Mean Absolute Error ◽

Evaluation Studies ◽

Model Performance ◽

Absolute Error ◽

Average Error ◽

Mean Square

Abstract. Both the root mean square error (RMSE) and the mean absolute error (MAE) are regularly employed in model evaluation studies. Willmott and Matsuura (2005) have suggested that the RMSE is not a good indicator of average model performance and might be a misleading indicator of average error and thus the MAE would be a better metric for that purpose. Their paper has been widely cited and may have influenced many researchers in choosing MAE when presenting their model evaluation statistics. However, we contend that the proposed avoidance of RMSE and the use of MAE is not the solution to the problem. In this technical note, we demonstrate that the RMSE is not ambiguous in its meaning, contrary to what was claimed by Willmott et al. (2009). The RMSE is more appropriate to represent model performance than the MAE when the error distribution is expected to be Gaussian. In addition, we show that the RMSE satisfies the triangle inequality requirement for a distance metric.

Download Full-text

Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance

Climate Research ◽

10.3354/cr030079 ◽

2005 ◽

Vol 30 ◽

pp. 79-82 ◽

Cited By ~ 1408

Author(s):

CJ Willmott ◽

K Matsuura

Keyword(s):

Root Mean Square Error ◽

Mean Square Error ◽

Root Mean Square ◽

Mean Absolute Error ◽

Model Performance ◽

Absolute Error ◽

Mean Square ◽

Average Model ◽

The Mean

Download Full-text

Investigating Tree Family Machine Learning Techniques for a Predictive System to Unveil Software Defects

Complexity ◽

10.1155/2020/6688075 ◽

2020 ◽

Vol 2020 ◽

pp. 1-21

Author(s):

Rashid Naseem ◽

Bilal Khan ◽

Arshad Ahmad ◽

Ahmad Almogren ◽

Saima Jabeen ◽

...

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Software Development ◽

Absolute Error ◽

Error Rates ◽

Machine Learning Techniques ◽

Defect Prediction ◽

Software Defects ◽

Squared Error ◽

Learning Techniques

Software defects prediction at the initial period of the software development life cycle remains a critical and important assignment. Defect prediction and correctness leads to the assurance of the quality of software systems and has remained integral to study in the previous years. The quick forecast of imperfect or defective modules in software development can serve the development squad to use the existing assets competently and effectively to provide remarkable software products in a given short timeline. Hitherto, several researchers have industrialized defect prediction models by utilizing statistical and machine learning techniques that are operative and effective approaches to pinpoint the defective modules. Tree family machine learning techniques are well-thought-out to be one of the finest and ordinarily used supervised learning methods. In this study, different tree family machine learning techniques are employed for software defect prediction using ten benchmark datasets. These techniques include Credal Decision Tree (CDT), Cost-Sensitive Decision Forest (CS-Forest), Decision Stump (DS), Forest by Penalizing Attributes (Forest-PA), Hoeffding Tree (HT), Decision Tree (J48), Logistic Model Tree (LMT), Random Forest (RF), Random Tree (RT), and REP-Tree (REP-T). Performance of each technique is evaluated using different measures, i.e., mean absolute error (MAE), relative absolute error (RAE), root mean squared error (RMSE), root relative squared error (RRSE), specificity, precision, recall, F-measure (FM), G-measure (GM), Matthew’s correlation coefficient (MCC), and accuracy. The overall outcomes of this paper suggested RF technique by producing best results in terms of reducing error rates as well as increasing accuracy on five datasets, i.e., AR3, PC1, PC2, PC3, and PC4. The average accuracy achieved by RF is 90.2238%. The comprehensive outcomes of this study can be used as a reference point for other researchers. Any assertion concerning the enhancement in prediction through any new model, technique, or framework can be benchmarked and verified.

Download Full-text

Prediction of short-term mortality in acute heart failure patients using minimal electronic health record data

BioData Mining ◽

10.1186/s13040-021-00255-w ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Ashwath Radhachandran ◽

Anurag Garikipati ◽

Nicole S. Zelin ◽

Emily Pellegrini ◽

Sina Ghandian ◽

...

Keyword(s):

Heart Failure ◽

Logistic Regression ◽

Decision Tree ◽

Risk Stratification ◽

Acute Heart Failure ◽

Model Performance ◽

Machine Learning Techniques ◽

Electronic Health Record Data ◽

Test Set ◽

Patient Disposition

Abstract Background Acute heart failure (AHF) is associated with significant morbidity and mortality. Effective patient risk stratification is essential to guiding hospitalization decisions and the clinical management of AHF. Clinical decision support systems can be used to improve predictions of mortality made in emergency care settings for the purpose of AHF risk stratification. In this study, several models for the prediction of seven-day mortality among AHF patients were developed by applying machine learning techniques to retrospective patient data from 236,275 total emergency department (ED) encounters, 1881 of which were considered positive for AHF and were used for model training and testing. The models used varying subsets of age, sex, vital signs, and laboratory values. Model performance was compared to the Emergency Heart Failure Mortality Risk Grade (EHMRG) model, a commonly used system for prediction of seven-day mortality in the ED with similar (or, in some cases, more extensive) inputs. Model performance was assessed in terms of area under the receiver operating characteristic curve (AUROC), sensitivity, and specificity. Results When trained and tested on a large academic dataset, the best-performing model and EHMRG demonstrated test set AUROCs of 0.84 and 0.78, respectively, for prediction of seven-day mortality. Given only measurements of respiratory rate, temperature, mean arterial pressure, and FiO2, one model produced a test set AUROC of 0.83. Neither a logistic regression comparator nor a simple decision tree outperformed EHMRG. Conclusions A model using only the measurements of four clinical variables outperforms EHMRG in the prediction of seven-day mortality in AHF. With these inputs, the model could not be replaced by logistic regression or reduced to a simple decision tree without significant performance loss. In ED settings, this minimal-input risk stratification tool may assist clinicians in making critical decisions about patient disposition by providing early and accurate insights into individual patient’s risk profiles.

Download Full-text

Detection and Severity Evaluation of Combined Rail Defects Using Deep Learning

Vibration ◽

10.3390/vibration4020022 ◽

2021 ◽

Vol 4 (2) ◽

pp. 341-356

Author(s):

Jessada Sresakoolchai ◽

Sakdirat Kaewunruen

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Mean Absolute Error ◽

Absolute Error ◽

Machine Learning Techniques ◽

Rolling Stock ◽

Raw Data ◽

Learning Techniques ◽

Combined Defects

Various techniques have been developed to detect railway defects. One of the popular techniques is machine learning. This unprecedented study applies deep learning, which is a branch of machine learning techniques, to detect and evaluate the severity of rail combined defects. The combined defects in the study are settlement and dipped joint. Features used to detect and evaluate the severity of combined defects are axle box accelerations simulated using a verified rolling stock dynamic behavior simulation called D-Track. A total of 1650 simulations are run to generate numerical data. Deep learning techniques used in the study are deep neural network (DNN), convolutional neural network (CNN), and recurrent neural network (RNN). Simulated data are used in two ways: simplified data and raw data. Simplified data are used to develop the DNN model, while raw data are used to develop the CNN and RNN model. For simplified data, features are extracted from raw data, which are the weight of rolling stock, the speed of rolling stock, and three peak and bottom accelerations from two wheels of rolling stock. In total, there are 14 features used as simplified data for developing the DNN model. For raw data, time-domain accelerations are used directly to develop the CNN and RNN models without processing and data extraction. Hyperparameter tuning is performed to ensure that the performance of each model is optimized. Grid search is used for performing hyperparameter tuning. To detect the combined defects, the study proposes two approaches. The first approach uses one model to detect settlement and dipped joint, and the second approach uses two models to detect settlement and dipped joint separately. The results show that the CNN models of both approaches provide the same accuracy of 99%, so one model is good enough to detect settlement and dipped joint. To evaluate the severity of the combined defects, the study applies classification and regression concepts. Classification is used to evaluate the severity by categorizing defects into light, medium, and severe classes, and regression is used to estimate the size of defects. From the study, the CNN model is suitable for evaluating dipped joint severity with an accuracy of 84% and mean absolute error (MAE) of 1.25 mm, and the RNN model is suitable for evaluating settlement severity with an accuracy of 99% and mean absolute error (MAE) of 1.58 mm.

Download Full-text

A Practical Tutorial for Decision Tree Induction

ACM Computing Surveys ◽

10.1145/3429739 ◽

2021 ◽

Vol 54 (1) ◽

pp. 1-38

Author(s):

Víctor Adrián Sosa Hernández ◽

Raúl Monroy ◽

Miguel Angel Medina-Pérez ◽

Octavio Loyola-González ◽

Francisco Herrera

Keyword(s):

Decision Tree ◽

Decision Trees ◽

Machine Learning Techniques ◽

Evaluation Measures ◽

Decision Tree Induction ◽

Learning Techniques ◽

Tree Models ◽

Evaluation Measure ◽

Main Components ◽

Support Decision Making

Experts from different domains have resorted to machine learning techniques to produce explainable models that support decision-making. Among existing techniques, decision trees have been useful in many application domains for classification. Decision trees can make decisions in a language that is closer to that of the experts. Many researchers have attempted to create better decision tree models by improving the components of the induction algorithm. One of the main components that have been studied and improved is the evaluation measure for candidate splits. In this article, we introduce a tutorial that explains decision tree induction. Then, we present an experimental framework to assess the performance of 21 evaluation measures that produce different C4.5 variants considering 110 databases, two performance measures, and 10× 10-fold cross-validation. Furthermore, we compare and rank the evaluation measures by using a Bayesian statistical analysis. From our experimental results, we present the first two performance rankings in the literature of C4.5 variants. Moreover, we organize the evaluation measures into two groups according to their performance. Finally, we introduce meta-models that automatically determine the group of evaluation measures to produce a C4.5 variant for a new database and some further opportunities for decision tree models.

Download Full-text

Evolutionary Algorithm for Improving Decision Tree with Global Discretization in Manufacturing

Sensors ◽

10.3390/s21082849 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2849

Author(s):

Sungbum Jun

Keyword(s):

Decision Tree ◽

Evolutionary Algorithm ◽

Decision Trees ◽

Manufacturing Systems ◽

Ensemble Methods ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Industrial Internet ◽

Tree Models ◽

Real World Datasets

Due to the recent advance in the industrial Internet of Things (IoT) in manufacturing, the vast amount of data from sensors has triggered the need for leveraging such big data for fault detection. In particular, interpretable machine learning techniques, such as tree-based algorithms, have drawn attention to the need to implement reliable manufacturing systems, and identify the root causes of faults. However, despite the high interpretability of decision trees, tree-based models make a trade-off between accuracy and interpretability. In order to improve the tree’s performance while maintaining its interpretability, an evolutionary algorithm for discretization of multiple attributes, called Decision tree Improved by Multiple sPLits with Evolutionary algorithm for Discretization (DIMPLED), is proposed. The experimental results with two real-world datasets from sensors showed that the decision tree improved by DIMPLED outperformed the performances of single-decision-tree models (C4.5 and CART) that are widely used in practice, and it proved competitive compared to the ensemble methods, which have multiple decision trees. Even though the ensemble methods could produce slightly better performances, the proposed DIMPLED has a more interpretable structure, while maintaining an appropriate performance level.

Download Full-text

Designing an efficient predictor model using PSNN and crow search based optimization technique for gold price prediction

Intelligent Decision Technologies ◽

10.3233/idt-200093 ◽

2021 ◽

pp. 1-9

Author(s):

Rajashree Dash ◽

Anuradha Routray ◽

Rasmita Dash ◽

Rasmita Rautray

Keyword(s):

Mean Square Error ◽

Optimization Technique ◽

Absolute Error ◽

Time Frame ◽

Gold Price ◽

Future Price ◽

Mean Square ◽

Learning Speed ◽

Predictor Model ◽

Hidden Layer

Predicting future price of Gold has always been an intriguing field of investigation for researchers as well as investors who desire to invest in present and gain profit in the future. Since ancient time, Gold is being arbitrated as a leading asset in monetary business. As the worth of gold changes within confined boundaries, reducing the effect of inflation, so it is a beneficial property favoured by many stakeholders. Hence, there is always an urge of a more authenticate model for forecasting the gold price based upon the changes in it in a previous time frame. This study focuses on designing an efficient predictor model using a Pi-Sigma Neural Network (PSNN) for forecasting future gold. The underlying motivation of using PSNN is its quick learning and easy implementation compared to other neural networks. The fixed unit weights used in between hidden and output layer of PSNN helps it in achieving faster learning speed compared to other similar types of networks. But estimating the unknown weights used in between the input and hidden layer is still a major challenge in its design phase. As final outcome of the network is highly influenced by its weight, so a novel Crow Search based nature inspired optimization algorithm (CSA) is proposed to estimate these adjustable weights of the network. The proposed model is also compared with Particle Swarm Optimization (PSO) and Differential Evolution (DE) based learning of PSNN. The model is validated over two historical datasets such as Gold/INR and Gold/AED by considering three statistical errors such as Mean Square Error (MSE), Root Mean Square Error (RMSE) and Mean Absolute Error (MAE). Empirical observations clearly show that, the developed CSA-PSNN predictor model is providing better prediction results compared to PSO-PSNN and DE-PSNN model.

Download Full-text