scholarly journals Comparative Analysis of Machine Learning Regression Algorithms on Air Pollution Dataset

Author(s):  
Sumit Upadhyay

Air pollution has both acute and chronic effects on human health, affecting a number of different systems and organs. Examining and protecting air quality has become one of the most essential activities for the government in many industrial and urban areas today. Air pollutants, such as carbon monoxide (CO), sulfur dioxide (SO(2)), nitrogen oxides (NOx), volatile organic compounds (VOCs), ozone (O(3)), heavy metals, and respirable particulate matter (PM2.5 and PM10), differ in their chemical composition, reaction properties, emission, time of disintegration and ability to diffuse in long or short distances. The main objective of this paper to build a model for predicting Air Quality Index(AQI) of the specific cities using various types of machine learning algorithms namely Multiple Linear Regression, K Nearest Neighbours(KNN), Support Vector Machine(SVM) and Decision Tree. And also evaluate and compare the performance of every algorithm based on their accuracy score and errors. Air Pollution dataset is publicly available on different government sites. The implementation phase dataset is divided as 80% for the training of different models and the rest of the dataset is used for testing the model.

2019 ◽  
Vol 9 (19) ◽  
pp. 4069 ◽  
Author(s):  
Huixiang Liu ◽  
Qing Li ◽  
Dongbing Yu ◽  
Yu Gu

Air pollution has become an important environmental issue in recent decades. Forecasts of air quality play an important role in warning people about and controlling air pollution. We used support vector regression (SVR) and random forest regression (RFR) to build regression models for predicting the Air Quality Index (AQI) in Beijing and the nitrogen oxides (NOX) concentration in an Italian city, based on two publicly available datasets. The root-mean-square error (RMSE), correlation coefficient (r), and coefficient of determination (R2) were used to evaluate the performance of the regression models. Experimental results showed that the SVR-based model performed better in the prediction of the AQI (RMSE = 7.666, R2 = 0.9776, and r = 0.9887), and the RFR-based model performed better in the prediction of the NOX concentration (RMSE = 83.6716, R2 = 0.8401, and r = 0.9180). This work also illustrates that combining machine learning with air quality prediction is an efficient and convenient way to solve some related environment problems.


Atmosphere ◽  
2021 ◽  
Vol 12 (12) ◽  
pp. 1647
Author(s):  
Zhenyu Song ◽  
Cheng Tang ◽  
Jin Qian ◽  
Bin Zhang ◽  
Yuki Todo

With the rapid development of the global economy, air pollution, which restricts sustainable development and threatens human health, has become an important focus of environmental governance worldwide. The modeling and reliable prediction of air quality remain substantial challenges because uncertainties residing in emissions data are unknown and the dynamic processes are not well understood. A number of machine learning approaches have been used to predict air quality to help alleviate air pollution, since accurate air quality estimation may result in significant social-economic development. From this perspective, a novel air quality estimation approach is proposed, which consists of two components: newly-designed dendritic neural regression (DNR) and customized scale-free network-based differential evolution (SFDE). The DNR can adaptively utilize spatio-temporal information to capture the nonlinear correlation between observations and air pollutant concentrations. Since the landscape of the weight space in DNR is vast and multimodal, SFDE is used as the optimization algorithm due to its powerful search ability. Extensive experimental results demonstrate that the proposed approach can provide stable and reliable performances in the estimation of both PM2.5 and PM10 concentrations, being significantly better than several commonly-used machine learning algorithms, such as support vector regression and long short-term memory.


2019 ◽  
Vol 8 (4) ◽  
pp. 7489-7492

— The global environment is presently facing a key issue of air pollution. The four air pollutants which are becoming a concerning intimidation to human health are respirble particulate matter, nitrogen oxide, particle matter, and sulfur dioxide. A vast amount of air quality data is collected in different monitoring stations throughout the world. The collected data can be analyzed to forecast the air quality index (AQI) of future. This paper proposes machine learning algorithms such as random forest, support vector machine, self adaptive resource allocation to predict the future AQI. Tamil Nadu Pollution Control Board (TNPCN) deployed air pollution monitoring station in five regions. Air pollutant of PM10, PM2.5, SO2 and NO2 are monitord and AQI is calculated.. The data collected from January 2019 to November 2019 by TNPCN and also AQI of previous five years were used This system attempts to predict the level of pollutant PM,SO2,NO2 in the air to detect the AQI.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-23 ◽  
Author(s):  
Mauro Castelli ◽  
Fabiana Martins Clemente ◽  
Aleš Popovič ◽  
Sara Silva ◽  
Leonardo Vanneschi

Predicting air quality is a complex task due to the dynamic nature, volatility, and high variability in time and space of pollutants and particulates. At the same time, being able to model, predict, and monitor air quality is becoming more and more relevant, especially in urban areas, due to the observed critical impact of air pollution on citizens’ health and the environment. In this paper, we employ a popular machine learning method, support vector regression (SVR), to forecast pollutant and particulate levels and to predict the air quality index (AQI). Among the various tested alternatives, radial basis function (RBF) was the type of kernel that allowed SVR to obtain the most accurate predictions. Using the whole set of available variables revealed a more successful strategy than selecting features using principal component analysis. The presented results demonstrate that SVR with RBF kernel allows us to accurately predict hourly pollutant concentrations, like carbon monoxide, sulfur dioxide, nitrogen dioxide, ground-level ozone, and particulate matter 2.5, as well as the hourly AQI for the state of California. Classification into six AQI categories defined by the US Environmental Protection Agency was performed with an accuracy of 94.1% on unseen validation data.


Author(s):  
K. Alpan ◽  
B. Sekeroglu

Abstract. Air pollution, which is one of the biggest problems created by the developing world, reaches severe levels, especially in urban areas. Weather stations established at certain points in countries regularly obtain data and inform people about air quality. In Smart City applications, it is aimed to perform this process with higher speed and accuracy by collecting data with thousands of sensors based on the Internet of Things. At this stage, artificial intelligence and machine learning plays a vital role in analyzing the data to be obtained. In this study, six pollutant concentrations; particulate matters (PM2.5 and PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), Ozone (O3), and carbon monoxide (CO), were predicted using three basic machine learning algorithms, namely, random forest, decision tree and support vector regression, by considering only meteorological data. Experiments on two different datasets showed that the random forest has a high prediction capacity (R2: 0.74–0.86), and high-accuracy predictions can be performed on pollutant concentrations using only meteorological data. This and further studies based on meteorological data would help to reduce the number of devices in Smart City applications and will make it more cost-effective.


Generally, air pollution refer to the release of various pollutants into the air which are threatening the human health and planet as well. The air pollution is the major dangerous vicious to the humanity ever faced. It causes major damage to animals, plants etc., if this keeps on continuing, the human being will face serious situations in the upcoming years. The major pollutants are from the transport and industries. So, to prevent this problem major sectors have to predict the air quality from transport and industries .In existing project there are many disadvantages. The project is about estimating the PM2.5 concentration by designing a photograph based method. But photographic method is not alone sufficient to calculate PM2.5 because it contains only one of the concentration of pollutants and it calculates only PM2.5 so there are some missing out of the major pollutants and the information needed for controlling the pollution .So thereby we proposed the machine learning techniques by user interface of GUI application. In this multiple dataset can be combined from the different source to form a generalized dataset and various machine learning algorithms are used to get the results with maximum accuracy. From comparing various machine learning algorithms we can obtain the best accuracy result. Our evaluation gives the comprehensive manual to sensitivity evaluation of model parameters with regard to overall performance in prediction of air high quality pollutants through accuracy calculation. Additionally to discuss and compare the performance of machine learning algorithms from the dataset with evaluation of GUI based user interface air quality prediction by attributes.


2021 ◽  
Author(s):  
Cong Cao

In this paper, we explore the impact of changes in traffic flow on local air pollution under specific meteorological conditions by integrating hourly traffic flow data, air pollution data and meteorological data, using generalized linear regression models and advanced machine learning algorithms: support vector machines and decision trees. The geographical location is Oslo, the capital of Norway, and the time we selected is from February 2020 to September 2020; We also selected 24-hour data for May 11 and 16 of the same year, representing weekday and holiday traffic flow, respectively, as a subset to further explore. Finally, we selected data from July 2020 for robustness testing, and algorithm performance verification.We found that: the maximum traffic flow on holidays is significantly higher than that on weekdays, but the holidays produce less concentration of {NO}_x throughout the month; the peak arrival time of {NO}_x,\ {NO}_2and NO concentrations is later than the peak arrival time of traffic flow. Among them, {NO}_x has a very significant variation, so we choose {NO}_x concentration as an air pollution indicator to measure the effect of traffic flow variation on air pollution; we also find that {NO}_xconcentration is negatively correlated with hourly precipitation, and the variation trend is like that of minimum air temperature. We used multiple imputation methods to interpolate the missing values. The decision tree results yield that when traffic volumes are high (>81%), low temperatures generate more concentrations of {NO}_x than high temperatures (an increase of 3.1%). Higher concentrations of {NO}_x (2.4%) are also generated when traffic volumes are low (no less than 22%) but there is some precipitation ≥ 0.27%.In the evaluation of the prediction accuracy of the machine learning algorithms, the support vector machine has the best prediction performance with high R-squared and small MAE, MSE and RMSE, indicating that the support vector machine has a better explanation for air pollution caused by traffic flow, while the decision tree is the second best, and the generalized linear regression model is the worst.The selected data for July 2020 obtained results consistent with the overall dataset.


Livers ◽  
2021 ◽  
Vol 1 (4) ◽  
pp. 294-312
Author(s):  
Fahad Mostafa ◽  
Easin Hasan ◽  
Morgan Williamson ◽  
Hafiz Khan

Medical diagnoses have important implications for improving patient care, research, and policy. For a medical diagnosis, health professionals use different kinds of pathological methods to make decisions on medical reports in terms of the patients’ medical conditions. Recently, clinicians have been actively engaged in improving medical diagnoses. The use of artificial intelligence and machine learning in combination with clinical findings has further improved disease detection. In the modern era, with the advantage of computers and technologies, one can collect data and visualize many hidden outcomes such as dealing with missing data in medical research. Statistical machine learning algorithms based on specific problems can assist one to make decisions. Machine learning (ML), data-driven algorithms can be utilized to validate existing methods and help researchers to make potential new decisions. The purpose of this study was to extract significant predictors for liver disease from the medical analysis of 615 humans using ML algorithms. Data visualizations were implemented to reveal significant findings such as missing values. Multiple imputations by chained equations (MICEs) were applied to generate missing data points, and principal component analysis (PCA) was used to reduce the dimensionality. Variable importance ranking using the Gini index was implemented to verify significant predictors obtained from the PCA. Training data (ntrain=399) for learning and testing data (ntest=216) in the ML methods were used for predicting classifications. The study compared binary classifier machine learning algorithms (i.e., artificial neural network, random forest (RF), and support vector machine), which were utilized on a published liver disease data set to classify individuals with liver diseases, which will allow health professionals to make a better diagnosis. The synthetic minority oversampling technique was applied to oversample the minority class to regulate overfitting problems. The RF significantly contributed (p<0.001) to a higher accuracy score of 98.14% compared to the other methods. Thus, this suggests that ML methods predict liver disease by incorporating the risk factors, which may improve the inference-based diagnosis of patients.


Air pollution has a serious impact on human health. It occurs because of natural and man-made factors. The major contribution of this research is that it provides a comparison between different methodologies and techniques of mathematical and machine learning models. The process began with integrating data from different sources at different time interval. The preprocessing phase resulted in two different datasets: one-hour and five-minute datasets. Next, we established a forecasting model for particulate matter PM2.5, which is one of the most prevalent air pollutants and its concentration affects air quality. Additionally, we completed a multivariate analysis to predict the PM2.5 value and check the effects of other air pollutants, traffic, and weather. The algorithms used are support vector regression, k-nearest neighbors and decision tree models. The results showed that for the one-hour data set, of the three algorithms, support vector regression has the least root-mean-square error (RMSE) and also lowest value in mean absolute error (MAE). Alternatively, for the five-minute dataset, we found that the auto-regression model showed the least RMSE and MAE; however, this model only predicts short-term PM2.5.


Author(s):  
Suprateek Halsana

<p>Air pollution is the “world’s largest environmental health threat”[1], causing 7 million deaths[1] worldwide every year. Its major constituents are PM2.5, PM10 and the harmful green house gases S02, N02, C0 and other effluents from vehicles and factories affecting not only humans but also other living organisms both on land and sea. The only effective solution to this global issue is to implement machine learning algorithms to predict the AQI (Air Quality Index ) that can make the people aware of the condition of the air of a certain region such that certain actions could be issued by the government for the improvement of the air quality in the future. The prime objective behind this project is to predict the AQI based on the concentration of PM2.5, PM10,S02, N02, C0 as well as weather conditions like temperature, pressure and humidity[2].Hence the data set is combined from various web sources like cpcb.nic.in and uci repository in order to bring accuracy in the prediction and to justify whether the Quality of air is suitable or not. This prediction will be brought about with the help of some supervised machine learning algorithms and the observation and the result will state which algorithm is giving better accuracy in prediction of AQI and which one is giving less error.</p>


Sign in / Sign up

Export Citation Format

Share Document