Efficient Water Quality Prediction Using Supervised Machine Learning

Umair Ahmed; Rafia Mumtaz; Hirra Anwar; Asad A. Shah; Rabia Irfan; José García-Nieto

doi:10.3390/w11112210

Efficient Water Quality Prediction Using Supervised Machine Learning

Water ◽

10.3390/w11112210 ◽

2019 ◽

Vol 11 (11) ◽

pp. 2210 ◽

Cited By ~ 6

Author(s):

Umair Ahmed ◽

Rafia Mumtaz ◽

Hirra Anwar ◽

Asad A. Shah ◽

Rabia Irfan ◽

...

Keyword(s):

Machine Learning ◽

Water Quality ◽

Real Time ◽

Polynomial Regression ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Rapid Urbanization ◽

Water Quality Prediction

Water makes up about 70% of the earth’s surface and is one of the most important sources vital to sustaining life. Rapid urbanization and industrialization have led to a deterioration of water quality at an alarming rate, resulting in harrowing diseases. Water quality has been conventionally estimated through expensive and time-consuming lab and statistical analyses, which render the contemporary notion of real-time monitoring moot. The alarming consequences of poor water quality necessitate an alternative method, which is quicker and inexpensive. With this motivation, this research explores a series of supervised machine learning algorithms to estimate the water quality index (WQI), which is a singular index to describe the general quality of water, and the water quality class (WQC), which is a distinctive class defined on the basis of the WQI. The proposed methodology employs four input parameters, namely, temperature, turbidity, pH and total dissolved solids. Of all the employed algorithms, gradient boosting, with a learning rate of 0.1 and polynomial regression, with a degree of 2, predict the WQI most efficiently, having a mean absolute error (MAE) of 1.9642 and 2.7273, respectively. Whereas multi-layer perceptron (MLP), with a configuration of (3, 7), classifies the WQC most efficiently, with an accuracy of 0.8507. The proposed methodology achieves reasonable accuracy using a minimal number of parameters to validate the possibility of its use in real time water quality detection systems.

Download Full-text

Predicting and Analysing the Behaviour of COVID-19

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit217213 ◽

2021 ◽

pp. 40-46

Author(s):

Gaurav Singh ◽

Shivam Rai ◽

Himanshu Mishra ◽

Manoj Kumar

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Systems Science ◽

Data Repository ◽

Support Vector ◽

Squared Error

The prime objective of this work is to predicting and analysing the Covid-19 pandemic around the world using Machine Learning algorithms like Polynomial Regression, Support Vector Machine and Ridge Regression. And furthermore, assess and compare the performance of the varied regression algorithms as far as parameters like R squared, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error. In this work, we have used the dataset available on Covid-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at John Hopkins University. We have analyzed the covid19 cases from 22/1/2020 till now. We applied a supervised machine learning prediction model to forecast the possible confirmed cases for the next ten days.

Download Full-text

Automated Outlier Detection for Electrical Motors and Transformers

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9304 ◽

2020 ◽

Vol 17 (9) ◽

pp. 4703-4708

Author(s):

K. Anitha Kumari ◽

Avinash Sharma ◽

S. Nivethitha ◽

V. Dharini ◽

V. Sanjith ◽

...

Keyword(s):

Machine Learning ◽

Polynomial Regression ◽

Mean Squared Error ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Squared Error ◽

Electrical Motors

Electrical appliances most commonly consist of two electrical devices, namely, electrical motors and transformers. Typically, electrical motors are normally used in all sort of industrial purposes. Failures of such motors results in serious problems, such as overheat, shut down and even burnt, in their host systems. Thus, more attention have to be paid in detecting the outliers. In a similar way, to avoid the unexpected power reliability problems and system damages, the prediction of the failures in the transformers is expected to quantify the impacts. By predicting the failures, the lifetime of the transformers increases and unnecessary accidents is avoided. Therefore, this paper presents the detection of the outliers in electrical motors and failures in transformers using supervised machine learning algorithms. Machine learning techniques such as Support Vector Machine (SVM), Random Forest (RF) and regression techniques like Support Vector Regression (SVR), Polynomial Regression (PR) are used to analyze the use cases of different motor specifications. Evaluation and the efficiency of findings are proved by considering accuracy, precision, F-measure, and recall for motors. Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Square Error (RMSE) and R-squared Error (R2) are considered as metrics for transformers. The proposed approach helps to identify the anomalies like vibration loss, copper loss and overheating in the industrial motor and to determine the abnormal functioning of the transformer that in turn leads to ascertain the lifetime. The proposed system analyses the behaviour of the electrical machines using the energy meter data and reports the outliers to users. It also analyses the abnormalities occurring in the transformer using the parameters involved in the degradation of the paper-oil insulation system and the voltage of operation as a whole leads to the predict the lifetime.

Download Full-text

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

Water ◽

10.3390/w12102927 ◽

2020 ◽

Vol 12 (10) ◽

pp. 2927

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Joo Hyun Bae ◽

Jimin Lee ◽

Woon Ji Park ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Multilayer Perceptron ◽

Short Term Memory ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Dam Inflow

Predicting dam inflow is necessary for effective water management. This study created machine learning algorithms to predict the amount of inflow into the Soyang River Dam in South Korea, using weather and dam inflow data for 40 years. A total of six algorithms were used, as follows: decision tree (DT), multilayer perceptron (MLP), random forest (RF), gradient boosting (GB), recurrent neural network–long short-term memory (RNN–LSTM), and convolutional neural network–LSTM (CNN–LSTM). Among these models, the multilayer perceptron model showed the best results in predicting dam inflow, with the Nash–Sutcliffe efficiency (NSE) value of 0.812, root mean squared errors (RMSE) of 77.218 m3/s, mean absolute error (MAE) of 29.034 m3/s, correlation coefficient (R) of 0.924, and determination coefficient (R2) of 0.817. However, when the amount of dam inflow is below 100 m3/s, the ensemble models (random forest and gradient boosting models) performed better than MLP for the prediction of dam inflow. Therefore, two combined machine learning (CombML) models (RF_MLP and GB_MLP) were developed for the prediction of the dam inflow using the ensemble methods (RF and GB) at precipitation below 16 mm, and the MLP at precipitation above 16 mm. The precipitation of 16 mm is the average daily precipitation at the inflow of 100 m3/s or more. The results show the accuracy verification results of NSE 0.857, RMSE 68.417 m3/s, MAE 18.063 m3/s, R 0.927, and R2 0.859 in RF_MLP, and NSE 0.829, RMSE 73.918 m3/s, MAE 18.093 m3/s, R 0.912, and R2 0.831 in GB_MLP, which infers that the combination of the models predicts the dam inflow the most accurately. CombML algorithms showed that it is possible to predict inflow through inflow learning, considering flow characteristics such as flow regimes, by combining several machine learning algorithms.

Download Full-text

A Comparative Study of Supervised Machine Learning Algorithms for the Prediction of Long-Range Chromatin Interactions

Genes ◽

10.3390/genes11090985 ◽

2020 ◽

Vol 11 (9) ◽

pp. 985 ◽

Cited By ~ 2

Author(s):

Thomas Vanhaeren ◽

Federico Divina ◽

Miguel García-Torres ◽

Francisco Gómez-Vela ◽

Wim Vanhoof ◽

...

Keyword(s):

Machine Learning ◽

Transcription Factors ◽

Long Range ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

The Other ◽

Supervised Machine Learning ◽

Chromatin Interaction ◽

Gradient Boosting ◽

Chromatin Interactions

The role of three-dimensional genome organization as a critical regulator of gene expression has become increasingly clear over the last decade. Most of our understanding of this association comes from the study of long range chromatin interaction maps provided by Chromatin Conformation Capture-based techniques, which have greatly improved in recent years. Since these procedures are experimentally laborious and expensive, in silico prediction has emerged as an alternative strategy to generate virtual maps in cell types and conditions for which experimental data of chromatin interactions is not available. Several methods have been based on predictive models trained on one-dimensional (1D) sequencing features, yielding promising results. However, different approaches vary both in the way they model chromatin interactions and in the machine learning-based strategy they rely on, making it challenging to carry out performance comparison of existing methods. In this study, we use publicly available 1D sequencing signals to model cohesin-mediated chromatin interactions in two human cell lines and evaluate the prediction performance of six popular machine learning algorithms: decision trees, random forests, gradient boosting, support vector machines, multi-layer perceptron and deep learning. Our approach accurately predicts long-range interactions and reveals that gradient boosting significantly outperforms the other five methods, yielding accuracies of about 95%. We show that chromatin features in close genomic proximity to the anchors cover most of the predictive information, as has been previously reported. Moreover, we demonstrate that gradient boosting models trained with different subsets of chromatin features, unlike the other methods tested, are able to produce accurate predictions. In this regard, and besides architectural proteins, transcription factors are shown to be highly informative. Our study provides a framework for the systematic prediction of long-range chromatin interactions, identifies gradient boosting as the best suited algorithm for this task and highlights cell-type specific binding of transcription factors at the anchors as important determinants of chromatin wiring mediated by cohesin.

Download Full-text

Predicting Pulmonary Function Testing from Quantified Computed Tomography Using Machine Learning Algorithms in Patients with COPD

Diagnostics ◽

10.3390/diagnostics9010033 ◽

2019 ◽

Vol 9 (1) ◽

pp. 33 ◽

Cited By ~ 5

Author(s):

Joshua Gawlitza ◽

Timo Sturm ◽

Kai Spohrer ◽

Thomas Henzler ◽

Ibrahim Akin ◽

...

Keyword(s):

Machine Learning ◽

Computed Tomography ◽

Lung Function ◽

Pulmonary Function ◽

Polynomial Regression ◽

Pulmonary Function Tests ◽

Quantitative Computed Tomography ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Function Tests

Introduction: Quantitative computed tomography (qCT) is an emergent technique for diagnostics and research in patients with chronic obstructive pulmonary disease (COPD). qCT parameters demonstrate a correlation with pulmonary function tests and symptoms. However, qCT only provides anatomical, not functional, information. We evaluated five distinct, partial-machine learning-based mathematical models to predict lung function parameters from qCT values in comparison with pulmonary function tests. Methods: 75 patients with diagnosed COPD underwent body plethysmography and a dose-optimized qCT examination on a third-generation, dual-source CT with inspiration and expiration. Delta values (inspiration—expiration) were calculated afterwards. Four parameters were quantified: mean lung density, lung volume low-attenuated volume, and full width at half maximum. Five models were evaluated for best prediction: average prediction, median prediction, k-nearest neighbours (kNN), gradient boosting, and multilayer perceptron. Results: The lowest mean relative error (MRE) was calculated for the kNN model with 16%. Similar low MREs were found for polynomial regression as well as gradient boosting-based prediction. Other models led to higher MREs and thereby worse predictive performance. Beyond the sole MRE, distinct differences in prediction performance, dependent on the initial dataset (expiration, inspiration, delta), were found. Conclusion: Different, partially machine learning-based models allow the prediction of lung function values from static qCT parameters within a reasonable margin of error. Therefore, qCT parameters may contain more information than we currently utilize and can potentially augment standard functional lung testing.

Download Full-text

Effective Parameter Optimization & Classification using Bat-Inspired Algorithm with Improving NSSA

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1498.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3343-3349

Keyword(s):

Machine Learning ◽

Optimal Parameter ◽

Personal Information ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Security Measures ◽

End User ◽

Effective Strategies ◽

Made In

Network Security is an important aspectin communication-related activities. In recent times, the advent of more sophisticated technologies changed the way the information is being sharedwith everyone in any part of the world.Concurrently, these advancements are mishandled to compromise the end-user devices intentionally to steal their personal information. The number of attacks made on targeted devices is increasing over time. Even though the security mechanisms used to defend the network is enhanced and kept updated periodically, new advanced methods are developed by the intruders to penetrate the system. In order to avoid these discrepancies, effective strategies must be applied to enhance the security measures in the network. In this paper, a machine learning-based approach is proposed to identify the pattern of different categories of attacks made in the past. KDD cup 1999 dataset is accessed to develop this predictive model. Bat optimization algorithm identifies the optimal parameter subset. Supervised machine learning algorithms were employed to train the model from the data to make predictions. The performance of the system is evaluated through evaluation metrics like accuracy, precision and so on. Four classification algorithms were used out of which, gradient boosting model outperformed the benchmarked algorithms and proved its importance on data classification based on the accuracy obtained from this model.

Download Full-text

Comparison of Machine Learning Algorithms for Discharge Prediction of Multipurpose Dam

Water ◽

10.3390/w13233369 ◽

2021 ◽

Vol 13 (23) ◽

pp. 3369

Author(s):

Jiyeong Hong ◽

Seoro Lee ◽

Gwanjae Lee ◽

Dongseok Yang ◽

Joo Hyun Bae ◽

...

Keyword(s):

Machine Learning ◽

Mean Squared Error ◽

Learning Algorithms ◽

Absolute Error ◽

Machine Learning Algorithms ◽

Physical Models ◽

Gradient Boosting ◽

Activity Schedules ◽

Discharge Data ◽

Dam Inflow

For effective water management in the downstream area of a dam, it is necessary to estimate the amount of discharge from the dam to quantify the flow downstream of the dam. In this study, a machine learning model was constructed to predict the amount of discharge from Soyang River Dam using precipitation and dam inflow/discharge data from 1980 to 2020. Decision tree, multilayer perceptron, random forest, gradient boosting, RNN-LSTM, and CNN-LSTM were used as algorithms. The RNN-LSTM model achieved a Nash–Sutcliffe efficiency (NSE) of 0.796, root-mean-squared error (RMSE) of 48.996 m3/s, mean absolute error (MAE) of 10.024 m3/s, R of 0.898, and R2 of 0.807, showing the best results in dam discharge prediction. The prediction of dam discharge using machine learning algorithms showed that it is possible to predict the amount of discharge, addressing limitations of physical models, such as the difficulty in applying human activity schedules and the need for various input data.

Download Full-text

Fine-grained classification of social science journal articles using textual data: A comparison of supervised machine learning approaches

Quantitative Science Studies ◽

10.1162/qss_a_00106 ◽

2020 ◽

pp. 1-26

Author(s):

Joshua Eykens ◽

Raf Guns ◽

Tim C.E. Engels

Keyword(s):

Social Sciences ◽

Machine Learning ◽

Social Science ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Fine Grained ◽

Textual Data

We compare two supervised machine learning algorithms—Multinomial Naïve Bayes and Gradient Boosting—to classify social science articles using textual data. The high level of granularity of the classification scheme used and the possibility that multiple categories are assigned to a document make this task challenging. To collect the training data, we query three discipline specific thesauri to retrieve articles corresponding to specialties in the classification. The resulting dataset consists of 113,909 records and covers 245 specialties, aggregated into 31 subdisciplines from three disciplines. Experts were consulted to validate the thesauri-based classification. The resulting multi-label dataset is used to train the machine learning algorithms in different configurations. We deploy a multi-label classifier chaining model, allowing for an arbitrary number of categories to be assigned to each document. The best results are obtained with Gradient Boosting. The approach does not rely on citation data. It can be applied in settings where such information is not available. We conclude that fine-grained text-based classification of social sciences publications at a subdisciplinary level is a hard task, for humans and machines alike. A combination of human expertise and machine learning is suggested as a way forward to improve the classification of social sciences documents.

Download Full-text

Prediction of groundwater quality indices using machine learning algorithms

Water Practice & Technology ◽

10.2166/wpt.2021.120 ◽

2021 ◽

Author(s):

Hemant Raheja ◽

Arun Goel ◽

Mahesh Pal

Keyword(s):

Machine Learning ◽

Water Quality ◽

Water Quality Index ◽

Quality Index ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Quality Indices ◽

Extreme Gradient Boosting ◽

Analysis Of Results

Abstract The present paper deals with performance evaluation of application of three machine learning algorithms such as Deep neural network (DNN), Gradient boosting machine (GBM) and Extreme gradient boosting (XGBoost) to evaluate the ground water indices over a study area of Haryana state (India). To investigate the applicability of these models, two water quality indices namely Entropy Water Quality Index (EWQI) and Water Quality Index (WQI) are employed in the present study. Analysis of results demonstrated that DNN has exhibited comparatively lower error values and it performed better in the prediction of both indices i.e. EWQI and WQI. The values of Correlation Coefficient (CC = 0.989), Root Mean Square Error (RMSE = 0.037), Nash–Sutcliffe efficiency (NSE = 0.995), Index of agreement (d = 0.999) for EWQI and CC = 0.975, RMSE = 0.055, NSE = 0.991, d = 0.998 for WQI have been obtained. From variable importance of input parameters, the Electrical conductivity (EC) was observed to be most significant and ‘pH’ was least significant parameter in predictions of EWQI and WQI using these three models. It is envisaged that the results of study can be used to righteously predict EWQI and WQI of groundwater to decide its potability.

Download Full-text

EV Idle Time Estimation on Charging Infrastructure, Comparing Supervised Machine Learning Regressions

Energies ◽

10.3390/en12020269 ◽

2019 ◽

Vol 12 (2) ◽

pp. 269 ◽

Cited By ~ 6

Author(s):

Alexandre Lucas ◽

Ricardo Barranco ◽

Nazir Refa

Keyword(s):

Machine Learning ◽

Public Policies ◽

Time Estimation ◽

Absolute Error ◽

Time Of Day ◽

Idle Time ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Charging Infrastructure ◽

The Right

The adoption of electric vehicles (EV) has to be complemented with the right charging infrastructure roll-out. This infrastructure is already in place in many cities throughout the main markets of China, EU and USA. Public policies are both taken at regional and/or at a city level targeting both EV adoption, but also charging infrastructure management. A growing trend is the increasing idle time over the years (time an EV is connected without charging), which directly impacts on the sizing of the infrastructure, hence its cost or availability. Such a phenomenon can be regarded as an opportunity but may very well undermine the same initiatives being taken to promote adoption; in any case it must be measured, studied, and managed. The time an EV takes to charge depends on its initial/final state of charge (SOC) and the power being supplied to it. The problem however is to estimate the time the EV remains parked after charging (idle time), as it depends on many factors which simple statistical analysis cannot tackle. In this study we apply supervised machine learning to a dataset from the Netherlands and analyze three regression algorithms, Random Forest, Gradient Boosting and XGBoost, identifying the most accurate one and main influencing parameters. The model can provide useful information for EV users, policy maker and network owners to better manage the network, targeting specific variables. The best performing model is XGBoost with an R2 score of 60.32% and mean absolute error of 1.11. The parameters influencing the model the most are: The time of day in which the charging sessions start and the total energy supplied with 22.35%, 15.57% contribution respectively. Partial dependencies of variables and model performances are presented and implications on public policies discussed.

Download Full-text