Influence of Data Splitting on Performance of Machine Learning Models in Prediction of Shear Strength of Soil

Mathematical Problems in Engineering ◽

10.1155/2021/4832864 ◽

2021 ◽

Vol 2021 ◽

pp. 1-15 ◽

Cited By ~ 1

Author(s):

Quang Hung Nguyen ◽

Hai-Bang Ly ◽

Lanh Si Ho ◽

Nadhir Al-Ansari ◽

Hiep Van Le ◽

...

Keyword(s):

Machine Learning ◽

Monte Carlo ◽

Shear Strength ◽

Mean Squared Error ◽

Absolute Error ◽

Engineering Properties ◽

Stable Model ◽

Predictive Capability ◽

Effective Manner ◽

Soil Shear Strength

The main objective of this study is to evaluate and compare the performance of different machine learning (ML) algorithms, namely, Artificial Neural Network (ANN), Extreme Learning Machine (ELM), and Boosting Trees (Boosted) algorithms, considering the influence of various training to testing ratios in predicting the soil shear strength, one of the most critical geotechnical engineering properties in civil engineering design and construction. For this aim, a database of 538 soil samples collected from the Long Phu 1 power plant project, Vietnam, was utilized to generate the datasets for the modeling process. Different ratios (i.e., 10/90, 20/80, 30/70, 40/60, 50/50, 60/40, 70/30, 80/20, and 90/10) were used to divide the datasets into the training and testing datasets for the performance assessment of models. Popular statistical indicators, such as Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Correlation Coefficient (R), were employed to evaluate the predictive capability of the models under different training and testing ratios. Besides, Monte Carlo simulation was simultaneously carried out to evaluate the performance of the proposed models, taking into account the random sampling effect. The results showed that although all three ML models performed well, the ANN was the most accurate and statistically stable model after 1000 Monte Carlo simulations (Mean R = 0.9348) compared with other models such as Boosted (Mean R = 0.9192) and ELM (Mean R = 0.8703). Investigation on the performance of the models showed that the predictive capability of the ML models was greatly affected by the training/testing ratios, where the 70/30 one presented the best performance of the models. Concisely, the results presented herein showed an effective manner in selecting the appropriate ratios of datasets and the best ML model to predict the soil shear strength accurately, which would be helpful in the design and engineering phases of construction projects.

Download Full-text

Extreme Learning Machine Based Prediction of Soil Shear Strength: A Sensitivity Analysis Using Monte Carlo Simulations and Feature Backward Elimination

Sustainability ◽

10.3390/su12062339 ◽

2020 ◽

Vol 12 (6) ◽

pp. 2339 ◽

Cited By ~ 15

Author(s):

Binh Thai Pham ◽

Trung Nguyen-Thoi ◽

Hai-Bang Ly ◽

Manh Duc Nguyen ◽

Nadhir Al-Ansari ◽

...

Keyword(s):

Monte Carlo ◽

Shear Strength ◽

Monte Carlo Simulations ◽

Extreme Learning Machine ◽

Mean Squared Error ◽

Majority Vote ◽

Absolute Error ◽

Liquid Limit ◽

Backward Elimination ◽

Learning Machine

Machine Learning (ML) has been applied widely in solving a lot of real-world problems. However, this approach is very sensitive to the selection of input variables for modeling and simulation. In this study, the main objective is to analyze the sensitivity of an advanced ML method, namely the Extreme Learning Machine (ELM) algorithm under different feature selection scenarios for prediction of shear strength of soil. Feature backward elimination supported by Monte Carlo simulations was applied to evaluate the importance of factors used for the modeling. A database constructed from 538 samples collected from Long Phu 1 power plant project was used for analysis. Well-known statistical indicators, such as the correlation coefficient (R), root mean squared error (RMSE), and mean absolute error (MAE), were utilized to evaluate the performance of the ELM algorithm. In each elimination step, the majority vote based on six elimination indicators was selected to decide the variable to be excluded. A number of 30,000 simulations were conducted to find out the most relevant variables in predicting the shear strength of soil using ELM. The results show that the performance of ELM is good but very different under different combinations of input factors. The moisture content, liquid limit, and plastic limit were found as the most critical variables for the prediction of shear strength of soil using the ML model.

Download Full-text

Prediction of Shear Strength of Soil Using Direct Shear Test and Support Vector Machine Model

The Open Construction and Building Technology Journal ◽

10.2174/1874836802014010041 ◽

2020 ◽

Vol 14 (1) ◽

pp. 41-50 ◽

Cited By ~ 2

Author(s):

Hai-Bang Ly ◽

Binh Thai Pham

Keyword(s):

Support Vector Machine ◽

Shear Strength ◽

Moisture Content ◽

Mean Squared Error ◽

Learning Algorithm ◽

Liquid Limit ◽

Support Vector ◽

Plastic Limit ◽

Svm Model ◽

Soil Shear Strength

Background: Shear strength of soil, the magnitude of shear stress that a soil can maintain, is an important factor in geotechnical engineering. Objective: The main objective of this study is dedicated to the development of a machine learning algorithm, namely Support Vector Machine (SVM) to predict the shear strength of soil based on 6 input variables such as clay content, moisture content, specific gravity, void ratio, liquid limit and plastic limit. Methods: An important number of experimental measurements, including more than 500 samples was gathered from the Long Phu 1 power plant project’s technical reports. The accuracy of the proposed SVM was evaluated using statistical indicators such as the coefficient of correlation (R), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) over a number of 200 simulations taking into account the random sampling effect. Finally, the most accurate SVM model was used to interpret the prediction results due to Partial Dependence Plots (PDP). Results: Validation results showed that SVM model performed well for prediction of soil shear strength (R = 0.9 to 0.95), and the moisture content, liquid limit and plastic limit were found as the three most affecting features to the prediction of soil shear strength. Conclusion: This study might help in quick and accurate prediction of soil shear strength for practical purposes in civil engineering.

Download Full-text

Machine learning and Grad-Cam based vascular aging assessment using photoplethysmogram (Preprint)

10.2196/preprints.31709 ◽

2021 ◽

Author(s):

Hangsik Shin

Keyword(s):

Machine Learning ◽

Correlation Coefficient ◽

Age Estimation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

Absolute Error ◽

Coefficient Of Determination ◽

Vascular Aging ◽

Squared Error ◽

Vascular Age

BACKGROUND Arterial stiffness due to vascular aging is a major indicator for evaluating cardiovascular risk. OBJECTIVE In this study, we propose a method of estimating age by applying machine learning to photoplethysmogram for non-invasive vascular age assessment. METHODS The machine learning-based age estimation model that consists of three convolutional layers and two-layer fully connected layers, was developed using segmented photoplethysmogram by pulse from a total of 752 adults aged 19–87 years. The performance of the developed model was quantitatively evaluated using mean absolute error, root-mean-squared-error, Pearson’s correlation coefficient, coefficient of determination. The Grad-Cam was used to explain the contribution of photoplethysmogram waveform characteristic in vascular age estimation. RESULTS Mean absolute error of 8.03, root mean squared error of 9.96, 0.62 of correlation coefficient, and 0.38 of coefficient of determination were shown through 10-fold cross validation. Grad-Cam, used to determine the weight that the input signal contributes to the result, confirmed that the contribution to the age estimation of the photoplethysmogram segment was high around the systolic peak. CONCLUSIONS The machine learning-based vascular aging analysis method using the PPG waveform showed comparable or superior performance compared to previous studies without complex feature detection in evaluating vascular aging. CLINICALTRIAL 2015-0104

Download Full-text

Machine Learning-Based Gully Erosion Susceptibility Mapping: A Case Study of Eastern India

Sensors ◽

10.3390/s20051313 ◽

2020 ◽

Vol 20 (5) ◽

pp. 1313 ◽

Cited By ~ 15

Author(s):

Sunil Saha ◽

Jagabandhu Roy ◽

Alireza Arabameri ◽

Thomas Blaschke ◽

Dieu Tien Bui

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Absolute Error ◽

Gully Erosion ◽

Machine Learning Techniques ◽

Weight Of Evidence ◽

Validation Dataset ◽

Boosted Regression Tree ◽

Area Index ◽

Statistical Measures

Gully erosion is a form of natural disaster and one of the land loss mechanisms causing severe problems worldwide. This study aims to delineate the areas with the most severe gully erosion susceptibility (GES) using the machine learning techniques Random Forest (RF), Gradient Boosted Regression Tree (GBRT), Naïve Bayes Tree (NBT), and Tree Ensemble (TE). The gully inventory map (GIM) consists of 120 gullies. Of the 120 gullies, 84 gullies (70%) were used for training and 36 gullies (30%) were used to validate the models. Fourteen gully conditioning factors (GCFs) were used for GES modeling and the relationships between the GCFs and gully erosion was assessed using the weight-of-evidence (WofE) model. The GES maps were prepared using RF, GBRT, NBT, and TE and were validated using area under the receiver operating characteristic (AUROC) curve, the seed cell area index (SCAI) and five statistical measures including precision (PPV), false discovery rate (FDR), accuracy, mean absolute error (MAE), and root mean squared error (RMSE). Nearly 7% of the basin has high to very high susceptibility for gully erosion. Validation results proved the excellent ability of these models to predict the GES. Of the analyzed models, the RF (AUROC = 0.96, PPV = 1.00, FDR = 0.00, accuracy = 0.87, MAE = 0.11, RMSE = 0.19 for validation dataset) is accurate enough for modeling and better suited for GES modeling than the other models. Therefore, the RF model can be used to model the GES areas not only in this river basin but also in other areas with the same geo-environmental conditions.

Download Full-text

Comparison of Machine Learning Algorithms for Discharge Prediction of Multipurpose Dam

Water ◽

10.3390/w13233369 ◽

2021 ◽

Vol 13 (23) ◽

pp. 3369

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Gwanjae Lee ◽

Dongseok Yang ◽

Joo Hyun Bae ◽

...

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Physical Models ◽

Gradient Boosting ◽

Activity Schedules ◽

Discharge Data ◽

Dam Inflow

For effective water management in the downstream area of a dam, it is necessary to estimate the amount of discharge from the dam to quantify the flow downstream of the dam. In this study, a machine learning model was constructed to predict the amount of discharge from Soyang River Dam using precipitation and dam inflow/discharge data from 1980 to 2020. Decision tree, multilayer perceptron, random forest, gradient boosting, RNN-LSTM, and CNN-LSTM were used as algorithms. The RNN-LSTM model achieved a Nash–Sutcliffe efficiency (NSE) of 0.796, root-mean-squared error (RMSE) of 48.996 m3/s, mean absolute error (MAE) of 10.024 m3/s, R of 0.898, and R2 of 0.807, showing the best results in dam discharge prediction. The prediction of dam discharge using machine learning algorithms showed that it is possible to predict the amount of discharge, addressing limitations of physical models, such as the difficulty in applying human activity schedules and the need for various input data.

Download Full-text

Visualization & Prediction of COVID-19 Future Outbreak by Using Machine Learning

International Journal of Information Technology and Computer Science ◽

10.5815/ijitcs.2021.03.02 ◽

2021 ◽

Vol 13 (3) ◽

pp. 16-32

Author(s):

Ahmed Hassan Mohammed Hassan ◽

◽

Arfan Ali Mohammed Qasem ◽

Walaa Faisal Mohammed Abdalla ◽

Omer H. Elhassan

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Future Perspective ◽

Support Vector ◽

Squared Error ◽

Vector Machines ◽

The World ◽

Negative Factors

Day by day, the accumulative incidence of COVID-19 is rapidly increasing. After the spread of the Corona epidemic and the death of more than a million people around the world countries, scientists and researchers have tended to conduct research and take advantage of modern technologies to learn machine to help the world to get rid of the Coronavirus (COVID-19) epidemic. To track and predict the disease Machine Learning (ML) can be deployed very effectively. ML techniques have been anticipated in areas that need to identify dangerous negative factors and define their priorities. The significance of a proposed system is to find the predict the number of people infected with COVID19 using ML. Four standard models anticipate COVID-19 prediction, which are Neural Network (NN), Support Vector Machines (SVM), Bayesian Network (BN) and Polynomial Regression (PR). The data utilized to test these models content of number of deaths, newly infected cases, and recoveries in the next 20 days. Five measures parameters were used to evaluate the performance of each model, namely root mean squared error (RMSE), mean squared error (MAE), mean absolute error (MSE), Explained Variance score and r2 score (R2). The significance and value of proposed system auspicious mechanism to anticipate these models for the current cenario of the COVID-19 epidemic. The results showed NN outperformed the other models, while in the available dataset the SVM performs poorly in all the prediction. Reference to our results showed that injuries will increase slightly in the coming days. Also, we find that the results give rise to hope due to the low death rate. For future perspective, case explanation and data amalgamation must be kept up persistently.

Download Full-text

Linear Attribute Projection and Performance Assessment for Signifying the Absenteeism at Work using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4405.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 1262-1267

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Mean Squared Error ◽

Working Hours ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Working Environment ◽

Experimental Result ◽

Technological Advancement ◽

Development Environment

In recent times, with the technological advancement the industry and organization are transforming all their inflow and outflow operations into digital identity. At the outset, the name of the organization is also in the hands of the employee. One of the major needs of the employee in the working environment is to avail leave or vacation based on their family circumstances. Based on the health condition and need of the employee, the organization must extend their leave for the satisfaction of the employee. The performance of the employee is also predicted based on the working days in the organization. With this view, this paper attempts to analyze the performance of the employee and the number of working hours by using machine learning algorithms. The Absenteeism at work dataset from UCI machine learning Repository is used for prediction analysis. The prediction of absent hours is achieved in three ways. Firstly, the correlation between each of the dataset attributes are found and depicted as a histogram. Secondly, the top most high correlated features are identified which are directly fitted to the regression models like Linear regression, SRD regression, RANSAC regression, Ridge regression, Huber regression, ARD Regression, Passive Aggressive Regression and Theilson Regression. Thirdly, the Performance analysis is done by analyzing the performance metrics like Mean Squared Error, Mean Absolute Error, R2 Score, Explained Variance Score and Mean Squared Log Error. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Result shows that the Passive Aggressive Regression have achieved the effective prediction of number of absent hours with minimum MSE of 0.04, MAE of 0.16, EVS of 0.03, MSLE of 0.32 and reasonable R2 Score of 0.89.

Download Full-text

Machine learning meets pKa

F1000Research ◽

10.12688/f1000research.22090.2 ◽

2020 ◽

Vol 9 ◽

pp. 113 ◽

Cited By ~ 2

Author(s):

Marcel Baltruschat ◽

Paul Czodrowski

Keyword(s):

Machine Learning ◽

Cross Validation ◽

Mean Squared Error ◽

Mean Absolute Error ◽

External Validation ◽

Absolute Error ◽

Source Model ◽

Squared Error ◽

Fold Cross Validation ◽

Better Than

We present a small molecule pKa prediction tool entirely written in Python. It predicts the macroscopic pKa value and is trained on a literature compilation of monoprotic compounds. Different machine learning models were tested and random forest performed best given a five-fold cross-validation (mean absolute error=0.682, root mean squared error=1.032, correlation coefficient r2 =0.82). We test our model on two external validation sets, where our model performs comparable to Marvin and is better than a recently published open source model. Our Python tool and all data is freely available at https://github.com/czodrowskilab/Machine-learning-meets-pKa.

Download Full-text

Predicting and Analysing the Behaviour of COVID-19

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217213 ◽

2021 ◽

pp. 40-46

Author(s):

Gaurav Singh ◽

Shivam Rai ◽

Himanshu Mishra ◽

Manoj Kumar

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Systems Science ◽

Data Repository ◽

Support Vector ◽

Squared Error

The prime objective of this work is to predicting and analysing the Covid-19 pandemic around the world using Machine Learning algorithms like Polynomial Regression, Support Vector Machine and Ridge Regression. And furthermore, assess and compare the performance of the varied regression algorithms as far as parameters like R squared, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error. In this work, we have used the dataset available on Covid-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. We have analyzed the covid19 cases from 22/1/2020 till now. We applied a supervised machine learning prediction model to forecast the possible confirmed cases for the next ten days.

Download Full-text

Regressor Fitting Of Feature Importance For Customer Segment Prediction With Ensembling Schemes Using Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f8255.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 952-956 ◽

Cited By ~ 2

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Manufacturing Companies ◽

Data Set ◽

Feature Importance ◽

Customer Segment ◽

Feature Scaling

Prediction of client behavior and their feedback remains as a challenging task in today’s world for all the manufacturing companies. The companies are struggling to increase their profit and annual turnover due to the lack of exact prediction of customer like and dislike. This leads to the accomplishment of machine learning algorithms for the prediction of customer demands. This paper attempts to identify the important features of the wine data set extracted from UCI Machine learning repository for the prediction of customer segment. The important features are extracted for the various ensembling methods like Ada boost regressor, Ada boost classifier, Random forest regressor, Extra Trees Regressor, Gradient booster regressor. The extracted feature importance of each of the ensembling methods is then fitted with logistic regression to analyze the performance. The same extracted feature importance of each of the ensembling methods are subjected to feature scaling and then fitted with logistic regression to analyze the performance. The Performance analysis is done with the performance metric such as Mean Squared error (MSE), Mean Absolute error (MAE), R2 Score, Explained Variance Score (EVS) and Mean Squared Log Error (MSLE). Experimental results shows that after applying feature scaling, the feature importance extracted from the Extra Tree Regressor is found to be effective with the MSE of 0.04, MAE of 0.03, R2 Score of 94%, EVS of 0.9 and MSLE of 0.01 as compared to other ensembling methods.

Download Full-text