Vayu: An Open-Source Toolbox for Visualization and Analysis of Crowd-Sourced Sensor Data

Sachit Mahajan

doi:10.3390/s21227726

Matrix Profile-Based Approach to Industrial Sensor Data Analysis Inside RDBMS

Mathematics ◽

10.3390/math9172146 ◽

2021 ◽

Vol 9 (17) ◽

pp. 2146

Author(s):

Mikhail Zymbler ◽

Elena Ivanova

Keyword(s):

Time Series ◽

Time Series Data ◽

Wide Spectrum ◽

Database Management System ◽

Third Party ◽

Sensor Data ◽

Data Conversion ◽

Series Data ◽

Time Interval ◽

Short Time Interval

Currently, big sensor data arise in a wide spectrum of Industry 4.0, Internet of Things, and Smart City applications. In such subject domains, sensors tend to have a high frequency and produce massive time series in a relatively short time interval. The data collected from the sensors are subject to mining in order to make strategic decisions. In the article, we consider the problem of choosing a Time Series Database Management System (TSDBMS) to provide efficient storing and mining of big sensor data. We overview InfluxDB, OpenTSDB, and TimescaleDB, which are among the most popular state-of-the-art TSDBMSs, and represent different categories of such systems, namely native, add-ons over NoSQL systems, and add-ons over relational DBMSs (RDBMSs), respectively. Our overview shows that, at present, TSDBMSs offer a modest built-in toolset to mine big sensor data. This leads to the use of third-party mining systems and unwanted overhead costs due to exporting data outside a TSDBMS, data conversion, and so on. We propose an approach to managing and mining sensor data inside RDBMSs that exploits the Matrix Profile concept. A Matrix Profile is a data structure that annotates a time series through the index of and the distance to the nearest neighbor of each subsequence of the time series and serves as a basis to discover motifs, anomalies, and other time-series data mining primitives. This approach is implemented as a PostgreSQL extension that allows an application programmer both to compute matrix profiles and mining primitives and to represent them as relational tables. Experimental case studies show that our approach surpasses the above-mentioned out-of-TSDBMS competitors in terms of performance since it assumes that sensor data are mined inside a TSDBMS at no significant overhead costs.

Download Full-text

TATSSI: A Free and Open-Source Platform for Analyzing Earth Observation Products with Quality Data Assessment

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10040267 ◽

2021 ◽

Vol 10 (4) ◽

pp. 267

Author(s):

Inder Tecuapetla-Gómez ◽

Gerardo López-Saldaña ◽

María Isabel Cruz-López ◽

Rainer Ressl

Keyword(s):

Time Series ◽

Open Source ◽

Satellite Data ◽

Time Series Data ◽

Earth Observation ◽

Quality Analysis ◽

Quality Data ◽

Series Data ◽

Data Handling ◽

Geographical Regions

Earth observation (EO) data play a crucial role in monitoring ecosystems and environmental processes. Time series of satellite data are essential for long-term studies in this context. Working with large volumes of satellite data, however, can still be a challenge, as the computational environment with respect to storage, processing and data handling can be demanding, which sometimes can be perceived as a barrier when using EO data for scientific purposes. In particular, open-source developments which comprise all components of EO data handling and analysis are still scarce. To overcome this difficulty, we present Tools for Analyzing Time Series of Satellite Imagery (TATSSI), an open-source platform written in Python that provides routines for downloading, generating, gap-filling, smoothing, analyzing and exporting EO time series. Since TATSSI integrates quality assessment and quality control flags when generating time series, data quality analysis is the backbone of any analysis made with the platform. We discuss TATSSI’s 3-layered architecture (data handling, engine and three application programming interfaces (API)); by allowing three APIs (a native graphical user interface, some Jupyter Notebooks and the Python command line) this development is exceptionally user-friendly. Furthermore, to demonstrate the application potential of TATSSI, we evaluated MODIS time series data for three case studies (irrigation area changes, evaluation of moisture dynamics in a wetland ecosystem and vegetation monitoring in a burned area) in different geographical regions of Mexico. Our analyses were based on methods such as the spatio-temporal distribution of maxima over time, statistical trend analysis and change-point decomposition, all of which were implemented in TATSSI. Our results are consistent with other scientific studies and results in these areas and with related in-situ data.

Download Full-text

Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182212213 ◽

2021 ◽

Vol 18 (22) ◽

pp. 12213

Author(s):

Taesung Kim ◽

Jinhee Kim ◽

Wonho Yang ◽

Hunjoo Lee ◽

Jaegul Choo

Keyword(s):

Time Series ◽

Deep Learning ◽

Air Quality ◽

Time Series Data ◽

Quality Data ◽

Series Data ◽

Missing Value ◽

Missing Value Imputation ◽

Spatio Temporal ◽

Air Quality Data

To prevent severe air pollution, it is important to analyze time-series air quality data, but this is often challenging as the time-series data is usually partially missing, especially when it is collected from multiple locations simultaneously. To solve this problem, various deep-learning-based missing value imputation models have been proposed. However, often they are barely interpretable, which makes it difficult to analyze the imputed data. Thus, we propose a novel deep learning-based imputation model that achieves high interpretability as well as shows great performance in missing value imputation for spatio-temporal data. We verify the effectiveness of our method through quantitative and qualitative results on a publicly available air-quality dataset.

Download Full-text

Joint Statement on new opportunities for air quality sensing - lower-cost sensors for public authorities and citizen science initiatives

Research Ideas and Outcomes ◽

10.3897/rio.5.e34059 ◽

2019 ◽

Vol 5 ◽

Cited By ~ 1

Author(s):

Sven Schade ◽

Wiebke Herding ◽

Arne Fellermann ◽

Alexander Kotsev

Keyword(s):

Air Quality ◽

Data Quality ◽

Citizen Science ◽

Lower Cost ◽

Sensor Data ◽

Quality Data ◽

Monitoring Networks ◽

Lower Accuracy ◽

Reference Quality ◽

The Right

Low-cost air quality sensors continue to spread. While their measurement quality does not compete with high-end instrumentation deployed in official air quality monitoring stations, they have a great potential to complement existing air quality assessments. However, we still see challenges related to data quality, data interoperability, and for collaborating on data assimilation and calibration. In order to move ahead we gathered as a group of 38 organisations from 14 different countries, including governmental authorities, network operators, citizen science initiatives, environmental Non-Governmental Organisations (NGOs), and academic researchers to explore how we can collaborate and better leverage each other’s work. This statement captures our joint findings and recommendations. Our key observations include: Co-operation between official monitoring networks (reference quality data) and lower-cost sensor operators is a key to make air quality data more usable. To be able to combine forces and benefit from each other’s expertise, the different perspectives of all stakeholders should be taken into account. There is a need to ensure that all users understand the possibilities and the limitations of making sense out of observations from different sensors. It is not realistic to expect that in the near future the data quality of lower-cost sensors will be as good as that of the official data. A way to make use of data that is of lower accuracy is by employing them in air quality modelling. Transparency about data quality is important to build more trust in the data, and to avoid unrealistic expectations. The need for interoperability should be clearly articulated and promoted by potential data users. There a need (and an opportunity) to provide guidance and standard operating procedures for the deployment and calibration of lower-cost sensors in order to increase the data quality delivered by participants of citizen science projects. Presently, we prefer to consider fixed-stationary sensors in a network instead of mobile sensor data. Furthermore, stationary data should not be aggregated with data from mobile sensors. Publishing and sharing this statement is only small step in the right direction and further actions have to be taken, inlcuding more in-depth discussions of the recommendations in smaller groups and follow-up meetings on dedicated topics. Co-operation between official monitoring networks (reference quality data) and lower-cost sensor operators is a key to make air quality data more usable. To be able to combine forces and benefit from each other’s expertise, the different perspectives of all stakeholders should be taken into account. There is a need to ensure that all users understand the possibilities and the limitations of making sense out of observations from different sensors. It is not realistic to expect that in the near future the data quality of lower-cost sensors will be as good as that of the official data. A way to make use of data that is of lower accuracy is by employing them in air quality modelling. Transparency about data quality is important to build more trust in the data, and to avoid unrealistic expectations. The need for interoperability should be clearly articulated and promoted by potential data users. There a need (and an opportunity) to provide guidance and standard operating procedures for the deployment and calibration of lower-cost sensors in order to increase the data quality delivered by participants of citizen science projects. Presently, we prefer to consider fixed-stationary sensors in a network instead of mobile sensor data. Furthermore, stationary data should not be aggregated with data from mobile sensors. Publishing and sharing this statement is only small step in the right direction and further actions have to be taken, inlcuding more in-depth discussions of the recommendations in smaller groups and follow-up meetings on dedicated topics.

Download Full-text

Joint Statement on new opportunities for air quality sensing - lower-cost sensors for public authorities and citizen science initiatives

Research Ideas and Outcomes ◽

10.3897/rio.5.e37478 ◽

2019 ◽

Vol 5 ◽

Author(s):

Sven Schade ◽

Wiebke Herding ◽

Arne Fellermann ◽

Alexander Kotsev ◽

Michel Gerboles ◽

...

Keyword(s):

Air Quality ◽

Data Quality ◽

Citizen Science ◽

Lower Cost ◽

Sensor Data ◽

Quality Data ◽

Monitoring Networks ◽

Lower Accuracy ◽

Reference Quality ◽

The Right

Low-cost air quality sensors continue to spread. While their measurement quality does not compete with high-end instrumentation deployed in official air quality monitoring stations, they have a great potential to complement existing air quality assessments. However, we still see challenges related to data quality, data interoperability, and for collaborating on data assimilation and calibration. In order to move ahead we gathered as a group of 38 organisations from 14 different countries, including governmental authorities, network operators, citizen science initiatives, environmental Non-Governmental Organisations (NGOs), and academic researchers to explore how we can collaborate and better leverage each other’s work. This statement captures our joint findings and recommendations. Our key observations include: Co-operation between official monitoring networks (reference quality data) and lower-cost sensor operators is a key to make air quality data more usable. To be able to combine forces and benefit from each other’s expertise, the different perspectives of all stakeholders should be taken into account. There is a need to ensure that all users understand the possibilities and the limitations of making sense out of observations from different sensors. It is not realistic to expect that in the near future the data quality of lower-cost sensors will be as good as that of the official data. A way to make use of data that is of lower accuracy is by employing them in air quality modelling. Transparency about data quality is important to build more trust in the data, and to avoid unrealistic expectations. The need for interoperability should be clearly articulated and promoted by potential data users. There a need (and an opportunity) to provide guidance and standard operating procedures for the deployment and calibration of lower-cost sensors in order to increase the data quality delivered by participants of citizen science projects. Presently, we prefer to consider fixed-stationary sensors in a network instead of mobile sensor data. Furthermore, stationary data should not be aggregated with data from mobile sensors. Publishing and sharing this statement is only small step in the right direction and further actions have to be taken, inlcuding more in-depth discussions of the recommendations in smaller groups and follow-up meetings on dedicated topics. Co-operation between official monitoring networks (reference quality data) and lower-cost sensor operators is a key to make air quality data more usable. To be able to combine forces and benefit from each other’s expertise, the different perspectives of all stakeholders should be taken into account. There is a need to ensure that all users understand the possibilities and the limitations of making sense out of observations from different sensors. It is not realistic to expect that in the near future the data quality of lower-cost sensors will be as good as that of the official data. A way to make use of data that is of lower accuracy is by employing them in air quality modelling. Transparency about data quality is important to build more trust in the data, and to avoid unrealistic expectations. The need for interoperability should be clearly articulated and promoted by potential data users. There a need (and an opportunity) to provide guidance and standard operating procedures for the deployment and calibration of lower-cost sensors in order to increase the data quality delivered by participants of citizen science projects. Presently, we prefer to consider fixed-stationary sensors in a network instead of mobile sensor data. Furthermore, stationary data should not be aggregated with data from mobile sensors. Publishing and sharing this statement is only small step in the right direction and further actions have to be taken, inlcuding more in-depth discussions of the recommendations in smaller groups and follow-up meetings on dedicated topics.

Download Full-text

Remaining Useful Life Prediction Using Temporal Convolution with Attention

AI ◽

10.3390/ai2010005 ◽

2021 ◽

Vol 2 (1) ◽

pp. 48-70

Author(s):

Wei Ming Tan ◽

T. Hui Teo

Keyword(s):

Neural Network ◽

Time Series ◽

Time Series Data ◽

Remaining Useful Life ◽

Sensor Data ◽

Series Data ◽

Multiple Time ◽

Data Set ◽

Form Complex ◽

Useful Life

Prognostic techniques attempt to predict the Remaining Useful Life (RUL) of a subsystem or a component. Such techniques often use sensor data which are periodically measured and recorded into a time series data set. Such multivariate data sets form complex and non-linear inter-dependencies through recorded time steps and between sensors. Many current existing algorithms for prognostic purposes starts to explore Deep Neural Network (DNN) and its effectiveness in the field. Although Deep Learning (DL) techniques outperform the traditional prognostic algorithms, the networks are generally complex to deploy or train. This paper proposes a Multi-variable Time Series (MTS) focused approach to prognostics that implements a lightweight Convolutional Neural Network (CNN) with attention mechanism. The convolution filters work to extract the abstract temporal patterns from the multiple time series, while the attention mechanisms review the information across the time axis and select the relevant information. The results suggest that the proposed method not only produces a superior accuracy of RUL estimation but it also trains many folds faster than the reported works. The superiority of deploying the network is also demonstrated on a lightweight hardware platform by not just being much compact, but also more efficient for the resource restricted environment.

Download Full-text

The Decomposition Analysis of Carbon Emissions: Theoretical Basis, Methods and Their Evaluations

Chinese Journal of Urban and Environmental Studies ◽

10.1142/s2345748120500207 ◽

2020 ◽

Vol 08 (04) ◽

pp. 2050020

Author(s):

Shenning QU

Keyword(s):

Environmental Economics ◽

Carbon Emissions ◽

Time Series Data ◽

Decomposition Analysis ◽

Series Data ◽

Process Time ◽

Structural Decomposition Analysis ◽

Systematic Analysis ◽

Index Decomposition ◽

Laspeyres Index

As an analytical framework for studying the characteristics of changes in things and their action mechanisms, the decomposition analysis of greenhouse gas emissions has been increasingly used in environmental economics research. The author introduces several decomposition methods commonly used at present and compares them. The index decomposition analysis (IDA) of carbon emissions usually uses energy identities to express carbon emissions as the product of several factor indexes, and decomposes them according to different weight-determining methods to clarify the incremental share of each index, in which way it is possible to decompose the models that contain less factors, process time series data, and conduct cross-country comparisons. It mainly includes the Laspeyres index decomposition and the Divisia index decomposition. Among them, the LMDI I method has been widely used for its advantages such as generating no residuals and easy to use. The structural decomposition analysis (SDA) can be used to conduct a more systematic analysis, decompose models with more influencing factors, and analyze the impacts of various factors on emissions, but this method has higher requirements for data collection. The biggest difference between the SDA method and the IDA methods of carbon emissions is that the former is based on an input–output system, while the latter only needs to use sectors’ aggregate data.

Download Full-text

Seasonal Influence on the Performance of Low-Cost NO2 Sensor Calibrations

Sensors ◽

10.3390/s21237919 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7919

Author(s):

Sjoerd van Ratingen ◽

Jan Vonk ◽

Christa Blokhuis ◽

Joost Wesseling ◽

Erik Tielemans ◽

...

Keyword(s):

Low Cost ◽

Added Value ◽

Sensor Data ◽

Sensor Technology ◽

Monitoring Networks ◽

Technical Problems ◽

Eu Directive ◽

Short Period ◽

No2 Sensor ◽

One Year

Low-cost sensor technology has been available for several years and has the potential to complement official monitoring networks. The current generation of nitrogen dioxide (NO2) sensors suffers from various technical problems. This study explores the added value of calibration models based on (multiple) linear regression including cross terms on the performance of an electrochemical NO2 sensor, the B43F manufactured by Alphasense. Sensor data were collected in duplicate at four reference sites in the Netherlands over a period of one year. It is shown that a calibration, using O3 and temperature in addition to a reference NO2 measurement, improves the prediction in terms of R2 from less than 0.5 to 0.69–0.84. The uncertainty of the calibrated sensors meets the Data Quality Objective for indicative methods specified by the EU directive in some cases and it was verified that the sensor signal itself remains an important predictor in the multilinear regressions. In practice, these sensors are likely to be calibrated over a period (much) shorter than one year. This study shows the dependence of the quality of the calibrated signal on the choice of these short (monthly) calibration and validation periods. This information will be valuable for determining short-period calibration strategies.

Download Full-text

NanoSen-AQM: From Sensors to Users

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v16i04.11871 ◽

2020 ◽

Vol 16 (04) ◽

pp. 51

Author(s):

Pedro Lucas ◽

Jorge Silva ◽

Filipe Araujo ◽

Catarina Silva ◽

Paulo Gil ◽

...

Keyword(s):

Air Quality ◽

Low Cost ◽

Sensor Data ◽

Quality Data ◽

Easy Access ◽

The Public ◽

Nano Sensors ◽

The Cost ◽

Air Quality Data ◽

Monitoring Platform

With the raising of environmental concerns regarding pollution, interest in monitoring air quality is increasing. However, air pollution data is mostly originated from a limited number of government-owned sensors, which can only capture a small fraction of reality. Improving air quality coverage in-volves reducing the cost of sensors and making data widely available to the public. To this end, the NanoSen-AQM project proposes the usage of low-cost nano-sensors as the basis for an air quality monitoring platform, capa-ble of collecting, aggregating, processing, storing, and displaying air quality data. Being an end-to-end system, the platform allows sensor owners to manage their sensors, as well as define calibration functions, that can im-prove data reliability. The public can visualize sensor data in a map, define specific clusters (groups of sensors) as favorites and set alerts in the event of bad air quality in certain sensors. The NanoSen-AQM platform provides easy access to air quality data, with the aim of improving public health.

Download Full-text

PREDICTION OF OZONE (O3) VALUES USING SUPPORT VECTOR REGRESSION METHOD

Jurnal Informatika Polinema ◽

10.33795/jip.v7i4.777 ◽

2021 ◽

Vol 7 (4) ◽

pp. 81-88

Author(s):

Chasandra Puspitasari ◽

Nur Rokhman ◽

Wahyono

Keyword(s):

Air Quality ◽

Support Vector Regression ◽

Time Series Data ◽

Motor Vehicles ◽

Primary Data ◽

Series Data ◽

Polynomial Kernel ◽

Support Vector ◽

Linear Kernel ◽

Air Quality Prediction

A large number of motor vehicles that cause congestion is a major factor in the poor air quality in big cities. Ozone (O3) is one of the main indicators in measuring the level of air pollution in the city of Surabaya to find out how air quality. Prediction of Ozone (O3) value is important as a support for the community and government in efforts to improve the air quality. This study aims to predict the value of Ozone (O3) in the form of time series data using the Support Vector Regression (SVR) method with the Linear, Polynomial, RBF, and ANOVA kernels. The data used in this study are 549 primary data from the daily average of ozone (O3) value of Surabaya in the period 1 July 2017 - 31 December 2018. The data will be used in the training and testing process until prediction results are obtained. The results obtained from this study are the Linear kernel produces the best prediction model with a MAPE value of 21.78% with a parameter value 𝜆 = 0.3; 𝜀 = 0.00001; cLR = 0.005; and C = 0.5. The results of the Polynomial kernel are not much different from the Linear kernel which has a MAPE value of 21.83%. While the RBF and ANOVA kernels each produce a model with MAPE value of 24.49% and 22.0%. These results indicate that the SVR method with the kernels used can predict Ozone values quite well.

Download Full-text