A Sustainable Method for Publishing Interoperable Open Data on the Web

Raf Buyle; Brecht Van de Vyvere; Julián Rojas Meléndez; Dwight Van Lancker; Eveline Vlassenroot; Mathias Van Compernolle; Stefan Lefever; Pieter Colpaert; Peter Mechant; Erik Mannens

doi:10.3390/data6080093

A Sustainable Method for Publishing Interoperable Open Data on the Web

Data ◽

10.3390/data6080093 ◽

2021 ◽

Vol 6 (8) ◽

pp. 93

Author(s):

Raf Buyle ◽

Brecht Van de Vyvere ◽

Julián Rojas Meléndez ◽

Dwight Van Lancker ◽

Eveline Vlassenroot ◽

...

Keyword(s):

Air Quality ◽

Smart Cities ◽

Research Question ◽

Open Data ◽

Cost Effective ◽

Linked Open Data ◽

Sensor Data ◽

Quality Data ◽

Data Publishing ◽

Railway Infrastructure

Smart cities need (sensor) data for better decision-making. However, while there are vast amounts of data available about and from cities, an intermediary is needed that connects and interprets (sensor) data on a Web-scale. Today, governments in Europe are struggling to publish open data in a sustainable, predictable and cost-effective way. Our research question considers what methods for publishing Linked Open Data time series, in particular air quality data, are suitable in a sustainable and cost-effective way. Furthermore, we demonstrate the cross-domain applicability of our data publishing approach through a different use case on railway infrastructure—Linked Open Data. Based on scenarios co-created with various governmental stakeholders, we researched methods to promote data interoperability, scalability and flexibility. The results show that applying a Linked Data Fragments-based approach on public endpoints for air quality and railway infrastructure data, lowers the cost of publishing and increases availability due to better Web caching strategies.

Download Full-text

Semantic Traffic Sensor Data: The TRAFAIR Experience

Applied Sciences ◽

10.3390/app10175882 ◽

2020 ◽

Vol 10 (17) ◽

pp. 5882

Author(s):

Federico Desimoni ◽

Sergio Ilarri ◽

Laura Po ◽

Federica Rollo ◽

Raquel Trillo-Lado

Keyword(s):

Air Quality ◽

Real Time ◽

Smart Cities ◽

Low Cost ◽

Semantic Annotation ◽

Open Data ◽

Transportation Systems ◽

Sensor Data ◽

Time Data ◽

Future Evolution

Modern cities face pressing problems with transportation systems including, but not limited to, traffic congestion, safety, health, and pollution. To tackle them, public administrations have implemented roadside infrastructures such as cameras and sensors to collect data about environmental and traffic conditions. In the case of traffic sensor data not only the real-time data are essential, but also historical values need to be preserved and published. When real-time and historical data of smart cities become available, everyone can join an evidence-based debate on the city’s future evolution. The TRAFAIR (Understanding Traffic Flows to Improve Air Quality) project seeks to understand how traffic affects urban air quality. The project develops a platform to provide real-time and predicted values on air quality in several cities in Europe, encompassing tasks such as the deployment of low-cost air quality sensors, data collection and integration, modeling and prediction, the publication of open data, and the development of applications for end-users and public administrations. This paper explicitly focuses on the modeling and semantic annotation of traffic data. We present the tools and techniques used in the project and validate our strategies for data modeling and its semantic enrichment over two cities: Modena (Italy) and Zaragoza (Spain). An experimental evaluation shows that our approach to publish Linked Data is effective.

Download Full-text

The Pensoft Data Publishing Workflow: The FAIRway from articles to Linked Open Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35902 ◽

2019 ◽

Vol 3 ◽

Author(s):

Lyubomir Penev ◽

Teodor Georgiev ◽

Viktor Senderov ◽

Mariya Dimitrova ◽

Pavel Stoev

Keyword(s):

Open Data ◽

Structured Data ◽

Linked Open Data ◽

Data Publishing ◽

Knowledge Graph ◽

Supplementary File ◽

Biodiversity Data ◽

Text Format ◽

Biodiversity Knowledge ◽

Data Elements

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.

Download Full-text

NanoSen-AQM: From Sensors to Users

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v16i04.11871 ◽

2020 ◽

Vol 16 (04) ◽

pp. 51

Author(s):

Pedro Lucas ◽

Jorge Silva ◽

Filipe Araujo ◽

Catarina Silva ◽

Paulo Gil ◽

...

Keyword(s):

Air Quality ◽

Low Cost ◽

Sensor Data ◽

Quality Data ◽

Easy Access ◽

The Public ◽

Nano Sensors ◽

The Cost ◽

Air Quality Data ◽

Monitoring Platform

With the raising of environmental concerns regarding pollution, interest in monitoring air quality is increasing. However, air pollution data is mostly originated from a limited number of government-owned sensors, which can only capture a small fraction of reality. Improving air quality coverage in-volves reducing the cost of sensors and making data widely available to the public. To this end, the NanoSen-AQM project proposes the usage of low-cost nano-sensors as the basis for an air quality monitoring platform, capa-ble of collecting, aggregating, processing, storing, and displaying air quality data. Being an end-to-end system, the platform allows sensor owners to manage their sensors, as well as define calibration functions, that can im-prove data reliability. The public can visualize sensor data in a map, define specific clusters (groups of sensors) as favorites and set alerts in the event of bad air quality in certain sensors. The NanoSen-AQM platform provides easy access to air quality data, with the aim of improving public health.

Download Full-text

Role of Vocabulary for Semantic Interoperability in Enabling the Linked Open Data Publishing

International Journal of Database Management Systems ◽

10.5121/ijdms.2012.4502 ◽

2012 ◽

Vol 4 (5) ◽

pp. 21-37 ◽

Cited By ~ 1

Author(s):

Ahsan Morshed

Keyword(s):

Open Data ◽

Semantic Interoperability ◽

Linked Open Data ◽

Data Publishing

Download Full-text

Air Quality Prediction in Smart Cities Using Machine Learning Technologies based on Sensor Data: A Review

Applied Sciences ◽

10.3390/app10072401 ◽

2020 ◽

Vol 10 (7) ◽

pp. 2401 ◽

Cited By ~ 5

Author(s):

Ditsuhi Iskandaryan ◽

Francisco Ramos ◽

Sergio Trilles

Keyword(s):

Machine Learning ◽

Air Pollution ◽

Air Quality ◽

Smart Cities ◽

Learning Technologies ◽

Sensor Data ◽

Quality Prediction ◽

Air Quality Prediction ◽

Air Pollution Prediction ◽

Pollution Prediction

The influence of machine learning technologies is rapidly increasing and penetrating almost in every field, and air pollution prediction is not being excluded from those fields. This paper covers the revision of the studies related to air pollution prediction using machine learning algorithms based on sensor data in the context of smart cities. Using the most popular databases and executing the corresponding filtration, the most relevant papers were selected. After thorough reviewing those papers, the main features were extracted, which served as a base to link and compare them to each other. As a result, we can conclude that: (1) instead of using simple machine learning techniques, currently, the authors apply advanced and sophisticated techniques, (2) China was the leading country in terms of a case study, (3) Particulate matter with diameter equal to 2.5 micrometers was the main prediction target, (4) in 41% of the publications the authors carried out the prediction for the next day, (5) 66% of the studies used data had an hourly rate, (6) 49% of the papers used open data and since 2016 it had a tendency to increase, and (7) for efficient air quality prediction it is important to consider the external factors such as weather conditions, spatial characteristics, and temporal features.

Download Full-text

Internet of Things and Enhanced Living Environments: Measuring and Mapping Air Quality Using Cyber-physical Systems and Mobile Computing Technologies

Sensors ◽

10.3390/s20030720 ◽

2020 ◽

Vol 20 (3) ◽

pp. 720 ◽

Cited By ~ 3

Author(s):

Gonçalo Marques ◽

Nuno Miranda ◽

Akash Kumar Bhoi ◽

Begonya Garcia-Zapirain ◽

Sofiane Hamrioui ◽

...

Keyword(s):

Air Quality ◽

Internet Of Things ◽

Mobile Computing ◽

Real Time ◽

Smart Cities ◽

Cost Effective ◽

Well Being ◽

World Health ◽

Computing Technologies ◽

Living Environments

This paper presents a real-time air quality monitoring system based on Internet of Things. Air quality is particularly relevant for enhanced living environments and well-being. The Environmental Protection Agency and the World Health Organization have acknowledged the material impact of air quality on public health and defined standards and policies to regulate and improve air quality. However, there is a significant need for cost-effective methods to monitor and control air quality which provide modularity, scalability, portability, easy installation and configuration features, and mobile computing technologies integration. The proposed method allows the measuring and mapping of air quality levels considering the spatial-temporal information. This system incorporates a cyber-physical system for data collection and mobile computing software for data consulting. Moreover, this method provides a cost-effective and efficient solution for air quality supervision and can be installed in vehicles to monitor air quality while travelling. The results obtained confirm the implementation of the system and present a relevant contribution to enhanced living environments in smart cities. This supervision solution provides real-time identification of unhealthy behaviours and supports the planning of possible interventions to increase air quality.

Download Full-text

A Streamlined Workflow for Conversion, Peer-Review and Publication of Omics Metadata as Omics Data Papers

10.20944/preprints202009.0357.v1 ◽

2020 ◽

Author(s):

Mariya Dimitrova ◽

Raïssa Meyer ◽

Pier Luigi Buttigieg ◽

Teodor Georgiev ◽

Georgi Zhelezov ◽

...

Keyword(s):

Peer Review ◽

Open Data ◽

Scientific Data ◽

Quality Data ◽

Data Publishing ◽

Academic Publishing ◽

Omics Data ◽

High Quality ◽

Data Interoperability ◽

The Impact

Data papers have emerged as a powerful instrument for open data publishing, obtaining credit, and establishing priority for datasets generated in scientific experiments. Academic publishing improves data and metadata quality through peer-review and increases the impact of datasets by enhancing their visibility, accessibility, and re-usability. We aimed to establish a new type of article structure and template for omics studies: the omics data paper. To improve data interoperability and further incentivise researchers to publish high-quality data sets, we created a workflow for streamlined import of omics metadata directly into a data paper manuscript. An omics data paper template was designed by defining key article sections which encourage the description of omics datasets and methodologies. The workflow was based on REpresentational State Transfer services and Xpath to extract information from the European Nucleotide Archive, ArrayExpress and BioSamples databases, which follow community-agreed standards. The workflow for automatic import of standard-compliant metadata into an omics data paper manuscript facilitates the authoring process. It demonstrates the importance and potential of creating machine-readable and standard-compliant metadata. The omics data paper structure and workflow to import omics metadata improves the data publishing landscape by providing a novel mechanism for creating high-quality, enhanced metadata records, peer reviewing and publishing of these. It constitutes a powerful addition for distribution, visibility, reproducibility and re-usability of scientific data. We hope that streamlined metadata re-use for scholarly publishing encourages authors to improve the quality of their metadata to achieve a truly FAIR data world.

Download Full-text

Cloud-based Non-Intrusive Load Monitoring System (NILM)

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1021.0986s319 ◽

2019 ◽

Vol 8 (6S3) ◽

pp. 125-131

Keyword(s):

Cloud Computing ◽

Smart Grid ◽

Monitoring System ◽

Data Storage ◽

Smart Cities ◽

Open Data ◽

Cost Effective ◽

Performance Comparison ◽

Machine Learning Algorithms ◽

Load Monitoring

Design and development of a cloud-based non-intrusive load monitoring System (NILM) is presented. It serves for monitoring and disaggregating the aggregated data such as smart metering into appliance-level load information by using cloud computing and machine learning algorithms implemented in cloud. The existing NILM systems are lack of scalability and limited in computing resources (computation and data storage) due to dedicated, closed and proprietary-based characteristics. They are inaccessible to variety of heterogeneity data (electrical and non-electrical data) openly for improving NILM performance. Therefore, this paper proposed a novel cloud-based NILM system to enable collection of these open data for load monitoring and other energy-related services. The collected data such as smart meter or data acquisition unit (DAQ), is pre-processed and uploaded to the cloud platform. A classifier algorithm based on Artificial Neural Network (ANN) is implemented in Azure ML Studio (AzureML), followed by the classifier testing with different combinations of feature set for the performance comparison. Furthermore, a web service is deployed for web APIs (Application Programming Interfaces) of applications such as smart grid and smart cities. The results shows that the ANN classifier for multiclass classification has improved performance with additional features of harmonics, apart from active and reactive powers used. It also demonstrates the feasibility of proposed cloud-based classifier model for load monitoring. Therefore, the proposed solution offers a convenient and cost-effective way of load monitoring via cloud computing technology for smart grid and smart home applications. Further work includes the use of other ML algorithms for classifier, performance analysis, development of cloud-based universal appliance data and use cases

Download Full-text

Vayu: An Open-Source Toolbox for Visualization and Analysis of Crowd-Sourced Sensor Data

Sensors ◽

10.3390/s21227726 ◽

2021 ◽

Vol 21 (22) ◽

pp. 7726

Author(s):

Sachit Mahajan

Keyword(s):

Air Quality ◽

Open Source ◽

Time Series Data ◽

Sensor Data ◽

Quality Data ◽

Sensor Technology ◽

Data Conversion ◽

Series Data ◽

Monitoring Networks ◽

Systematic Analysis

Recent advances in sensor technology and the availability of low-cost and low-power sensors have changed the air quality monitoring paradigm. These sensors are being widely used by scientists and citizens for monitoring air quality at finer spatial-temporal resolution. Such practices are opening up opportunities to enhance the traditional monitoring networks, but at the same time, these sensors are producing large data sets that can become overwhelming and challenging when it comes to the scientific tools and skills required to analyze the data. To address this challenge, an open-source, robust, and cross-platform sensor data analysis toolbox called Vayu is developed that allows researchers and citizens to do detailed and reproducible analyses of air quality data. Vayu combines the power of visualization and statistical analysis using a simple and intuitive graphical user interface. Additionally, it offers a comprehensive set of tools for systematic analysis such as data conversion, interpolation, aggregation, and prediction. Even though Vayu was developed with air quality research in mind, it can be used to analyze different kinds of time-series data.

Download Full-text

Sensor network for traffic-related pollutant monitoring in urban public transport interchanges

10.5194/egusphere-egu2020-2852 ◽

2020 ◽

Author(s):

Li Sun ◽

Peng Wei ◽

Jieqing He ◽

Dane Westerdahl ◽

Zhi Ning

Keyword(s):

Hong Kong ◽

Air Quality ◽

Public Transport ◽

Natural Ventilation ◽

Ambient Air ◽

Sensor Data ◽

Quality Data ◽

Concentration Limit ◽

Traffic Emission ◽

Concentration Levels

Public transport interchanges (PTI) are special transportation-impacted micro-environments in Hong Kong where public transport such as buses, taxis and mini-buses pass through or terminate, and where passengers queue for the transport. Hong Kong has 65 PTIs in total and most of these are located under residential or commercial buildings. Mechanical ventilation is included in all PTIs due to very limited natural ventilation and it is intended to limit the accumulation of air pollution from the various vehicles. However, numerous complaints were reported concerning PTIs&#8217; air quality, and data are lacking to characterize pollution in these places. The purpose of this study was to determine the overall nature of pollutants in a sample of PTIs, to identify whether hot spots were present and how these might be related to both ventilation practices and bus activities in PTIs.&#160;8 PTIs were selected for simultaneous measurement of NO, NO2 and PM2.5 in 4 days of sampling. A monitoring network was formed by a group of sensor-based air monitoring facilities which were deployed at multiple points in the passenger waiting areas of each selected PTI and also at the ventilation intakes. Specific data calibration and validation protocols were well designed for sensor data control and assurance in such near-source monitoring application.&#160;&#160;NO, NO2 and PM2.5&#173; measured inside the PTIs were compared with the ambient air quality data reported by nearby routine air quality monitoring stations. The average concentration levels of NOx were about 4-16 times higher than the ambient levels. NO, NO2 and NOx in the PTIs themselves showed similar daily repeatable variation patterns and NO concentration levels were always higher than NO2&#8217;s during the daytime, while the ambient showed opposite patterns. This indicates NOx pollutants inside the PTIs were mainly produced by the local bus activities. NO and NO2 measured at some ventilation intakes had even higher concentration levels than those of PTIs, which means the existing ventilation systems were generally not adequate to control the pollution concentration and sometimes could even make the problem worse. Exceedances of NO2&#8217;s 1-hour concentration limit (0.30 mg/m3) were observed at several monitoring sites and were found mainly located in the middle of the PTIs where ventilation is poorer or close to the bus stops occupied by older buses. PM2.5 measured inside the PTIs followed the patterns of ambient PM2.5 and showed comparable concentration levels, which implies traffic emission, especially the exhaust from buses in the PTIs may not be the main source for particle pollution.&#160;Concern was raised on the implementation of pollution mitigation plans inside PTIs to satisfy the urgent health protection need for the commuters and also for the staff of bus companies who work there. To effectively control the PTI pollution and limit exposures, it is necessary to consider the bus volume, bus emission type and ventilation design. &#160;

Download Full-text