scholarly journals A Machine Learning Approach to Predict Air Quality in California

Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-23 ◽  
Author(s):  
Mauro Castelli ◽  
Fabiana Martins Clemente ◽  
Aleš Popovič ◽  
Sara Silva ◽  
Leonardo Vanneschi

Predicting air quality is a complex task due to the dynamic nature, volatility, and high variability in time and space of pollutants and particulates. At the same time, being able to model, predict, and monitor air quality is becoming more and more relevant, especially in urban areas, due to the observed critical impact of air pollution on citizens’ health and the environment. In this paper, we employ a popular machine learning method, support vector regression (SVR), to forecast pollutant and particulate levels and to predict the air quality index (AQI). Among the various tested alternatives, radial basis function (RBF) was the type of kernel that allowed SVR to obtain the most accurate predictions. Using the whole set of available variables revealed a more successful strategy than selecting features using principal component analysis. The presented results demonstrate that SVR with RBF kernel allows us to accurately predict hourly pollutant concentrations, like carbon monoxide, sulfur dioxide, nitrogen dioxide, ground-level ozone, and particulate matter 2.5, as well as the hourly AQI for the state of California. Classification into six AQI categories defined by the US Environmental Protection Agency was performed with an accuracy of 94.1% on unseen validation data.

Author(s):  
K. Alpan ◽  
B. Sekeroglu

Abstract. Air pollution, which is one of the biggest problems created by the developing world, reaches severe levels, especially in urban areas. Weather stations established at certain points in countries regularly obtain data and inform people about air quality. In Smart City applications, it is aimed to perform this process with higher speed and accuracy by collecting data with thousands of sensors based on the Internet of Things. At this stage, artificial intelligence and machine learning plays a vital role in analyzing the data to be obtained. In this study, six pollutant concentrations; particulate matters (PM2.5 and PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), Ozone (O3), and carbon monoxide (CO), were predicted using three basic machine learning algorithms, namely, random forest, decision tree and support vector regression, by considering only meteorological data. Experiments on two different datasets showed that the random forest has a high prediction capacity (R2: 0.74–0.86), and high-accuracy predictions can be performed on pollutant concentrations using only meteorological data. This and further studies based on meteorological data would help to reduce the number of devices in Smart City applications and will make it more cost-effective.


Author(s):  
Sumit Upadhyay

Air pollution has both acute and chronic effects on human health, affecting a number of different systems and organs. Examining and protecting air quality has become one of the most essential activities for the government in many industrial and urban areas today. Air pollutants, such as carbon monoxide (CO), sulfur dioxide (SO(2)), nitrogen oxides (NOx), volatile organic compounds (VOCs), ozone (O(3)), heavy metals, and respirable particulate matter (PM2.5 and PM10), differ in their chemical composition, reaction properties, emission, time of disintegration and ability to diffuse in long or short distances. The main objective of this paper to build a model for predicting Air Quality Index(AQI) of the specific cities using various types of machine learning algorithms namely Multiple Linear Regression, K Nearest Neighbours(KNN), Support Vector Machine(SVM) and Decision Tree. And also evaluate and compare the performance of every algorithm based on their accuracy score and errors. Air Pollution dataset is publicly available on different government sites. The implementation phase dataset is divided as 80% for the training of different models and the rest of the dataset is used for testing the model.


2019 ◽  
Vol 8 (4) ◽  
pp. 3244-3249

In the current moving technological business sector, the amount spent for attaching the new customer is highly expensive and time consuming process than adopting some methods to hold and retain the existing customers. So the business sector is in need to make a research on with holding the existing customers by using the current technology. The methods to make the retention of the existing customers with high reliablility are a challenging task. With this view, we focus on predicting the customer churn for the banking application. This paper uses the customer churn bank modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data preprocessing is done and the relationship between the attributes are identified. Second, the data set is reduced with the principal component analysis to form the 2 component feature reduced dataset. Third, the raw dataset and 2 component PCA reduced dataset is fitted to various solvers of logistic regression classifiers and the performance is analyzed with the confusion matrix. Fourth, the raw dataset and 2 component PCA reduced dataset is fitted to various neighboring algorithms of K-Nearest Neighbors classifiers and the performance is analyzed with the confusion matrix. Fifth, the raw dataset and 2 component PCA reduced dataset is fitted to various kernels of Support Vector Machine classifiers and the performance is analyzed with the confusion matrix. The implementation is carried out with python code using Anaconda Navigator. Experimental results shows that, the rbf kernel of Support vector machine classifier is effective with the accuracy of 85.8% before applying PCA and accuracy of 80.9% after applying PCA compared to other classifiers.


Due to the critical impacts of air pollution, prediction and monitoring of air quality in urban areas are essential tasks. However, because of the dynamic nature and high Spatio-temporal variability, the prediction of the air pollutant concentrations is a complex Spatio-temporal problem. The data is collected in specific area such as climate condition and vehicular pollutant occurring in the peak hours. the predication process is used to compare the algorithm artificial neural network and support vector machine process. This paper presents a survey on Air quality prediction using artificial intelligence


Generally, Air pollution alludes to the issue of toxins into the air that are harmful to human well being and the entire planet. It can be described as one of the most dangerous threats that the humanity ever faced. It causes damage to animals, crops, forests etc. To prevent this problem in transport sectors have to predict air quality from pollutants using machine learning techniques. Subsequently, air quality assessment and prediction has turned into a significant research zone. The aim is to investigate machine learning based techniques for air quality prediction. The air quality dataset is preprocessed with respect to univariate analysis, bi-variate and multi-variate analysis, missing value treatments, data validation, data cleaning/preparing. Then, air quality is predicted using supervised machine learning techniques like Logistic Regression, Random Forest, K-Nearest Neighbors, Decision Tree and Support Vector Machines. The performance of various machine learning algorithms is compared with respect to Precision, Recall and F1 Score. It is found that Decision Tree algorithm works well for predicting air quality. This application can help the meteorological Department in predicting air quality. In future, this work can be optimized by applying Artificial Intelligence techniques.


2020 ◽  
Vol 15 ◽  
Author(s):  
Shuwen Zhang ◽  
Qiang Su ◽  
Qin Chen

Abstract: Major animal diseases pose a great threat to animal husbandry and human beings. With the deepening of globalization and the abundance of data resources, the prediction and analysis of animal diseases by using big data are becoming more and more important. The focus of machine learning is to make computers learn how to learn from data and use the learned experience to analyze and predict. Firstly, this paper introduces the animal epidemic situation and machine learning. Then it briefly introduces the application of machine learning in animal disease analysis and prediction. Machine learning is mainly divided into supervised learning and unsupervised learning. Supervised learning includes support vector machines, naive bayes, decision trees, random forests, logistic regression, artificial neural networks, deep learning, and AdaBoost. Unsupervised learning has maximum expectation algorithm, principal component analysis hierarchical clustering algorithm and maxent. Through the discussion of this paper, people have a clearer concept of machine learning and understand its application prospect in animal diseases.


2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


2021 ◽  
Vol 21 (9) ◽  
pp. 7373-7394
Author(s):  
Jérôme Barré ◽  
Hervé Petetin ◽  
Augustin Colette ◽  
Marc Guevara ◽  
Vincent-Henri Peuch ◽  
...  

Abstract. This study provides a comprehensive assessment of NO2 changes across the main European urban areas induced by COVID-19 lockdowns using satellite retrievals from the Tropospheric Monitoring Instrument (TROPOMI) onboard the Sentinel-5p satellite, surface site measurements, and simulations from the Copernicus Atmosphere Monitoring Service (CAMS) regional ensemble of air quality models. Some recent TROPOMI-based estimates of changes in atmospheric NO2 concentrations have neglected the influence of weather variability between the reference and lockdown periods. Here we provide weather-normalized estimates based on a machine learning method (gradient boosting) along with an assessment of the biases that can be expected from methods that omit the influence of weather. We also compare the weather-normalized satellite-estimated NO2 column changes with weather-normalized surface NO2 concentration changes and the CAMS regional ensemble, composed of 11 models, using recently published estimates of emission reductions induced by the lockdown. All estimates show similar NO2 reductions. Locations where the lockdown measures were stricter show stronger reductions, and, conversely, locations where softer measures were implemented show milder reductions in NO2 pollution levels. Average reduction estimates based on either satellite observations (−23 %), surface stations (−43 %), or models (−32 %) are presented, showing the importance of vertical sampling but also the horizontal representativeness. Surface station estimates are significantly changed when sampled to the TROPOMI overpasses (−37 %), pointing out the importance of the variability in time of such estimates. Observation-based machine learning estimates show a stronger temporal variability than model-based estimates.


The surveys regarding air pollution shows that there has been a hasty growth due to the emission of fuels and exhaust gases from factories. The Air Quality Index (AQI) has been launched to note the contemporary status of the air quality. The intent of AQI is to aid every individual know how the regional air quality will make an impact on them. The Environmental Protection Agency assess the AQI for five major air pollutants namely Nitrogen dioxide (NO2), ground-level ozone (O3), particle pollution (PM10, PM2.5), carbon monoxide (CO), and sulphur dioxide (SO2). The intent of the project is to congregate real-time Air Quality Index from distinct monitoring stations across India, analysing the data and reporting on it. Collect the real-time data using the API key provided by Open Government Data (OGD) platform India. This is done by making use of Microsoft Business Intelligence (MSBI) and Power BI Tools to transform, analyse and visualize the data. This project can be utilized to develop various programs like Ozone today in Europe and in mobile applications which acts as an alert system that can protect people from air pollution.


Author(s):  
Gonzalo Vergara ◽  
Juan J. Carrasco ◽  
Jesus Martínez-Gómez ◽  
Manuel Domínguez ◽  
José A. Gámez ◽  
...  

The study of energy efficiency in buildings is an active field of research. Modeling and predicting energy related magnitudes leads to analyze electric power consumption and can achieve economical benefits. In this study, classical time series analysis and machine learning techniques, introducing clustering in some models, are applied to predict active power in buildings. The real data acquired corresponds to time, environmental and electrical data of 30 buildings belonging to the University of León (Spain). Firstly, we segmented buildings in terms of their energy consumption using principal component analysis. Afterwards, we applied state of the art machine learning methods and compare between them. Finally, we predicted daily electric power consumption profiles and compare them with actual data for different buildings. Our analysis shows that multilayer perceptrons have the lowest error followed by support vector regression and clustered extreme learning machines. We also analyze daily load profiles on weekdays and weekends for different buildings.


Sign in / Sign up

Export Citation Format

Share Document