Estimation of Pollutant Emissions in Real Driving Conditions Based on Data from OBD and Machine Learning

Néstor Diego Rivera-Campoverde; José Luis Muñoz-Sanz; Blanca del Valle Arenas-Ramirez

doi:10.3390/s21196344

Estimation of Pollutant Emissions in Real Driving Conditions Based on Data from OBD and Machine Learning

Sensors ◽

10.3390/s21196344 ◽

2021 ◽

Vol 21 (19) ◽

pp. 6344

Author(s):

Néstor Diego Rivera-Campoverde ◽

José Luis Muñoz-Sanz ◽

Blanca del Valle Arenas-Ramirez

Keyword(s):

Machine Learning ◽

Urban Areas ◽

Classification Tree ◽

Pollutant Emissions ◽

Data Logger ◽

Vehicular Emission ◽

Data Set ◽

Traffic Conditions ◽

Large Measurement ◽

Driving Conditions

This article proposes a methodology for the estimation of emissions in real driving conditions, based on board diagnostics data and machine learning, since it has been detected that there are no models for estimating pollutants without large measurement campaigns. For this purpose, driving data are obtained by means of a data logger and emissions through a portable emissions measurement system in a real driving emissions test. The data obtained are used to train artificial neural networks that estimate emissions, having previously estimated the relative importance of variables through random forest techniques. Then, by the application of the K-means algorithm, labels are obtained to implement a classification tree and thereby determine the selected gear by the driver. These models were loaded with a data set generated covering 1218.19 km of driving. The results generated were compared to the ones obtained by applying the international vehicle emissions model and with the results of the real driving emissions test, showing evidence of similar results. The main contribution of this article is that the generated model is stronger in different traffic conditions and presents good results at the speed interval with small differences at low average driving speeds because more than half of the vehicle’s trip occurs in urban areas, in completely random driving conditions. These results can be useful for the estimation of emission factors with potential application in vehicular homologation processes and the estimation of vehicular emission inventories.

Download Full-text

Machine-learning methods in the classification of water bodies

Environmental & Socio-economic Studies ◽

10.1515/environ-2016-0010 ◽

2016 ◽

Vol 4 (2) ◽

pp. 34-42 ◽

Cited By ~ 1

Author(s):

Marek Sołtysiak ◽

Marcin Blachnik ◽

Dominika Dąbrowska

Keyword(s):

Machine Learning ◽

Water Body ◽

Urban Areas ◽

Water Bodies ◽

Learning Methods ◽

Amphibian Species ◽

Data Set ◽

Machine Learning Methods ◽

Nearest Neighbours

AbstractAmphibian species have been considered as useful ecological indicators. They are used as indicators of environmental contamination, ecosystem health and habitat quality., Amphibian species are sensitive to changes in the aquatic environment and therefore, may form the basis for the classification of water bodies. Water bodies in which there are a large number of amphibian species are especially valuable even if they are located in urban areas. The automation of the classification process allows for a faster evaluation of the presence of amphibian species in the water bodies. Three machine-learning methods (artificial neural networks, decision trees and the k-nearest neighbours algorithm) have been used to classify water bodies in Chorzów – one of 19 cities in the Upper Silesia Agglomeration. In this case, classification is a supervised data mining method consisting of several stages such as building the model, the testing phase and the prediction. Seven natural and anthropogenic features of water bodies (e.g. the type of water body, aquatic plants, the purpose of the water body (destination), position of the water body in relation to any possible buildings, condition of the water body, the degree of littering, the shore type and fishing activities) have been taken into account in the classification. The data set used in this study involved information about 71 different water bodies and 9 amphibian species living in them. The results showed that the best average classification accuracy was obtained with the multilayer perceptron neural network.

Download Full-text

A machine learning approach to address air quality changes during the COVID-19 lockdown in Buenos Aires, Argentina

10.5194/essd-2021-318 ◽

2021 ◽

Author(s):

Melisa Diaz Resquin ◽

Pablo Lichtig ◽

Diego Alessandrello ◽

Marcelo De Oto ◽

Darío Gómez ◽

...

Keyword(s):

Machine Learning ◽

Air Quality ◽

Random Forest ◽

Buenos Aires ◽

Pearson Correlation ◽

Ambient Air ◽

Air Pollutant ◽

Pollutant Emissions ◽

Meteorological Variables ◽

Data Set

Abstract. The COVID-19 (COronaVIrus Disease 2019) pandemic provided the unique opportunity to evaluate the role of a sudden and deep decline in air pollutant emissions in the ambient air of numerous cities worldwide. Argentina, in general, and the Metropolitan Area of Buenos Aires (MABA), in particular, were under strict control measures from March to May 2020. Private vehicle restrictions were intense, and primary pollutant concentrations decreased substantially. To quantify the changes in CO, NO, NO2, PM10, SO2 and O3 concentrations under the stay-at-home orders imposed against COVID-19, we compared the observations during the different lockdown phases with both observations during the same period in 2019 and concentrations that would have occurred under a business-as-usual (BAU) scenario under no restrictions. We employed a Random Forest (RF) algorithm to estimate the BAU concentration levels. This approach exhibited a high predictive performance based on only a handful of available indicators (meteorological variables, air quality concentrations and emission temporal variations) at a low computational cost. Results during testing showed that the model captured the observed daily variations and the diurnal cycles of these pollutants with a normalized mean bias (NMB) of less than 11 % and Pearson correlation coefficients of the diurnal variations of between 0.65 and 0.89 for all the pollutants considered. Based on the Random Forest results, we estimated that the lockdown implied concentration decreases of up to 47 % (CO), 60 % (NOx) and 36 % (PM10) during the strictest mobility restrictions. Higher O3 concentrations (up to 87 %) were also observed, which is consistent with the response in a VOC-limited chemical regime to the decline in NOx emissions. Relative changes with respect to the 2019 observations were consistent with those estimated with the Random Forest model, but indicated that larger decreases in primary pollutants and lower increases in O3 would have occurred. This points out to the need of accounting not only for the differences in emissions, but also in meteorological variables to evaluate the lockdown effects on air quality. The findings of this study may be valuable for formulating emission control strategies that do not disregard their implication on secondary pollutants. The data set used in this study and an introductory machine learning code are openly available at https://data.mendeley.com/datasets/h9y4hb8sf8/1 (Diaz Resquin et al., 2021).

Download Full-text

SEMCITY TOULOUSE: A BENCHMARK FOR BUILDING INSTANCE SEGMENTATION IN SATELLITE IMAGES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-5-2020-109-2020 ◽

2020 ◽

Vol V-5-2020 ◽

pp. 109-116

Author(s):

R. Roscher ◽

M. Volpi ◽

C. Mallet ◽

L. Drees ◽

J. D. Wegner

Keyword(s):

Machine Learning ◽

Urban Areas ◽

Large Scale ◽

Training Data ◽

Adjacent Area ◽

Learning Approaches ◽

Test Bed ◽

Learning Methods ◽

Data Set ◽

Instance Segmentation

Abstract. In order to reach the goal of reliably solving Earth monitoring tasks, automated and efficient machine learning methods are necessary for large-scale scene analysis and interpretation. A typical bottleneck of supervised learning approaches is the availability of accurate (manually) labeled training data, which is particularly important to train state-of-the-art (deep) learning methods. We present SemCity Toulouse, a publicly available, very high resolution, multi-spectral benchmark data set for training and evaluation of sophisticated machine learning models. The benchmark acts as test bed for single building instance segmentation which has been rarely considered before in densely built urban areas. Additional information is provided in the form of a multi-class semantic segmentation annotation covering the same area plus an adjacent area 3 times larger. The data set addresses interested researchers from various communities such as photogrammetry and remote sensing, but also computer vision and machine learning.

Download Full-text

Identifying Urban Areas by Combining Human Judgment and Machine Learning: An Application to India

10.1596/1813-9450-9160 ◽

2020 ◽

Cited By ~ 1

Author(s):

Virgilio Galdo ◽

Yue Li ◽

Martin Rama

Keyword(s):

Machine Learning ◽

Urban Areas ◽

Human Judgment

Download Full-text

Smart Driving System for Improving Traffic Flow

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse/v7i7/0174 ◽

2017 ◽

Vol 7 (7) ◽

pp. 236

Author(s):

Rajesh Kumar Gupta ◽

L. N. Padhy ◽

Sanjay Kumar Padhi

Keyword(s):

Traffic Flow ◽

Traffic Congestion ◽

Urban Areas ◽

Adaptive Cruise Control ◽

Human Perception ◽

Cruise Control ◽

Driving System ◽

V2v Communication ◽

Traffic Conditions ◽

Cooperative Adaptive Cruise Control

Traffic congestion on road networks is one of the most significant problems that is faced in almost all urban areas. Driving under traffic congestion compels frequent idling, acceleration, and braking, which increase energy consumption and wear and tear on vehicles. By efficiently maneuvering vehicles, traffic flow can be improved. An Adaptive Cruise Control (ACC) system in a car automatically detects its leading vehicle and adjusts the headway by using both the throttle and the brake. Conventional ACC systems are not suitable in congested traffic conditions due to their response delay. For this purpose, development of smart technologies that contribute to improved traffic flow, throughput and safety is needed. In today’s traffic, to achieve the safe inter-vehicle distance, improve safety, avoid congestion and the limited human perception of traffic conditions and human reaction characteristics constrains should be analyzed. In addition, erroneous human driving conditions may generate shockwaves in addition which causes traffic flow instabilities. In this paper to achieve inter-vehicle distance and improved throughput, we consider Cooperative Adaptive Cruise Control (CACC) system. CACC is then implemented in Smart Driving System. For better Performance, wireless communication is used to exchange Information of individual vehicle. By introducing vehicle to vehicle (V2V) communication and vehicle to roadside infrastructure (V2R) communications, the vehicle gets information not only from its previous and following vehicle but also from the vehicles in front of the previous Vehicle and following vehicle. This enables a vehicle to follow its predecessor at a closer distance under tighter control.

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not. <br>

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text