Assessment and Benchmarking of Spatially Enabled RDF Stores for the Next Generation of Spatial Data Infrastructure

Weiming Huang; Syed Amir Raza; Oleg Mirzov; Lars Harrie

doi:10.3390/ijgi8070310

Assessment and Benchmarking of Spatially Enabled RDF Stores for the Next Generation of Spatial Data Infrastructure

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8070310 ◽

2019 ◽

Vol 8 (7) ◽

pp. 310 ◽

Cited By ~ 1

Author(s):

Weiming Huang ◽

Syed Amir Raza ◽

Oleg Mirzov ◽

Lars Harrie

Keyword(s):

Real World ◽

Spatial Data ◽

Linked Data ◽

Large Scale ◽

Data Science ◽

Earth Science ◽

Geospatial Data ◽

Semantic Heterogeneity ◽

Next Generation ◽

Data Infrastructure

Geospatial information is indispensable for various real-world applications and is thus a prominent part of today’s data science landscape. Geospatial data is primarily maintained and disseminated through spatial data infrastructures (SDIs). However, current SDIs are facing challenges in terms of data integration and semantic heterogeneity because of their partially siloed data organization. In this context, linked data provides a promising means to unravel these challenges, and it is seen as one of the key factors moving SDIs toward the next generation. In this study, we investigate the technical environment of the support for geospatial linked data by assessing and benchmarking some popular and well-known spatially enabled RDF stores (RDF4J, GeoSPARQL-Jena, Virtuoso, Stardog, and GraphDB), with a focus on GeoSPARQL compliance and query performance. The tests were performed in two different scenarios. In the first scenario, geospatial data forms a part of a large-scale data infrastructure and is integrated with other types of data. In this scenario, we used ICOS Carbon Portal’s metadata—a real-world Earth Science linked data infrastructure. In the second scenario, we benchmarked the RDF stores in a dedicated SDI environment that contains purely geospatial data, and we used geospatial datasets with both crowd-sourced and authoritative data (the same test data used in a previous benchmark study, the Geographica benchmark). The assessment and benchmarking results demonstrate that the GeoSPARQL compliance of the RDF stores has encouragingly advanced in the last several years. The query performances are generally acceptable, and spatial indexing is imperative when handling a large number of geospatial objects. Nevertheless, query correctness remains a challenge for cross-database interoperability. In conclusion, the results indicate that the spatial capacity of the RDF stores has become increasingly mature, which could benefit the development of future SDIs.

Download Full-text

A Framework of Local Geospatial Data Infrastructure for Sustainable Urban Development

Jurnal Teknologi ◽

10.11113/jt.v71.3834 ◽

2014 ◽

Vol 71 (4) ◽

Author(s):

Azman Ariffin ◽

Nabila Ibrahim ◽

Ghazali Desa ◽

Uznir Ujang ◽

Hishamuddin Mohd Ali ◽

...

Keyword(s):

Urban Development ◽

Spatial Data ◽

Geospatial Data ◽

Sustainable Urban Development ◽

Data Infrastructure ◽

Shape Category ◽

Key Dimensions ◽

Data Layers ◽

Conceptual Data

This paper addresses the need to develop a Local Geospatial Data Infrastructure (LGDI) for sustainable urban development. This research will highlight the effective and efficient framework for the development of local infrastructure. This paper presents a framework (a combination of domain based and goal based frameworks) for developing a Local Geospatial Data Infrastructure. The basis of this research is on a case study conducted in a Malaysian city. The main focus of the case study was on measuring and assessing sustainability. Six conceptual frameworks were produced based on 6 key dimensions of sustainability. The developed framework consists of 6 conceptual data models and 6 conceptual data structures. It was concluded that 30 spatial data layers are needed of which 12 data layers are categorized as point shape, 17 data layers are categorized as polygon shape and 1 data layer as line shape category.

Download Full-text

WPS ENABLED SDI: AN OPEN SOURCE APPROACH TO PROVIDE GEOPROCESSING IN WEB ENVIRONMENT

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-iv-5-w2-119-2019 ◽

2019 ◽

Vol IV-5/W2 ◽

pp. 119-126

Author(s):

A. K. Tripathi ◽

S. Agrawal ◽

R. D. Gupta

Keyword(s):

Open Source ◽

Spatial Data ◽

Geospatial Data ◽

Distributed Data ◽

Data Infrastructure ◽

Web Environment ◽

Local Resources ◽

Processing Service ◽

Server Architecture ◽

The Web

Abstract. Sharing and management of geospatial data among different communities and users is a challenge which is suitably addressed by Spatial Data Infrastructure (SDI). SDI helps people in the discovery, editing, processing and visualization of spatial data. The user can download the data from SDI and process it using the local resources. However, large volume and heterogeneity of data make this processing difficult at the client end. This problem can be resolved by orchestrating the Web Processing Service (WPS) with SDI. WPS is a service interface through which geoprocessing can be done over the internet. In this paper, a WPS enabled SDI framework with OGC compliant services is conceptualized and developed. It is based on the three tier client server architecture. OGC services are provided through GeoServer. WPS extension of GeoServer is used to perform geospatial data processing and analysis. The developed framework is utilized to create a public health SDI prototype using Open Source Software (OSS). The integration of WPS with SDI demonstrates how the various data analysis operations of WPS can be performed over the web on distributed data sources provided by SDI.

Download Full-text

Distributed Geospatial Processing Services

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch188 ◽

2011 ◽

pp. 1186-1193 ◽

Cited By ~ 4

Author(s):

Carlos Granell ◽

Laura Díaz ◽

Michael Gould

Keyword(s):

Web Services ◽

Web Service ◽

Spatial Data ◽

Geospatial Data ◽

Service Description ◽

Service Oriented Architectures ◽

Data Infrastructure ◽

Loosely Coupled ◽

Tightly Coupled ◽

Geospatial Services

The development of geographic information systems (GISs) has been highly influenced by the overall progress of information technology (IT). These systems evolved from monolithic systems to become personal desktop GISs, with all or most data held locally, and then evolved to the Internet GIS paradigm in the form of Web services (Peng & Tsou, 2001). The highly distributed Web services model is such that geospatial data are loosely coupled with the underlying systems used to create and handle them, and geospatial processing functionalities are made available as remote, interoperable, discoverable geospatial services. In recent years the software industry has moved from tightly coupled application architectures such as CORBA (Common Object Request Broker Architecture?Vinoski, 1997) toward service-oriented architectures (SOAs) based on a network of interoperable, well-described services accessible via Web protocols. This has led to de facto standards for delivery of services such as Web Service Description Language (WSDL) to describe the functionality of a service, Simple Object Access Protocol (SOAP) to encapsulate Web service messages, and Universal Description, Discovery, and Integration (UDDI) to register and provide access to service offerings. Adoption of this Web services technology as an option to monolithic GISs is an emerging trend to provide distributed geospatial access, visualization, and processing. The GIS approach to SOA-based applications is perhaps best represented by the spatial data infrastructure (SDI) paradigm, in which standardized interfaces are the key to allowing geographic services to communicate with each other in an interoperable manner. This article focuses on standard interfaces and also on current implementations of geospatial data processing over the Web, commonly used in SDI environments. We also mention several challenges yet to be met, such as those concerned with semantics, discovery, and chaining of geospatial processing services and also with the extension of geospatial processing capabilities to the SOA world.

Download Full-text

The Case for a Linked Data Research Engine for Legal Scholars

European Journal of Risk Regulation ◽

10.1017/err.2019.51 ◽

2019 ◽

Vol 11 (1) ◽

pp. 70-93

Author(s):

Kody MOODLEY ◽

Pedro V HERNANDEZ-SERRANO ◽

Amrapali J ZAVERI ◽

Marcel GH SCHAPER ◽

Michel DUMONTIER ◽

...

Keyword(s):

Linked Data ◽

Data Science ◽

State Of The Art ◽

Case Law ◽

Data Infrastructure ◽

Legal Information ◽

Online Databases ◽

Technical Proficiency ◽

Main Barrier ◽

Current Systems

This contribution explores the application of data science and artificial intelligence to legal research, more specifically an element that has not received much attention: the research infrastructure required to make such analysis possible. In recent years, EU law has become increasingly digitised and published in online databases such as EUR-Lex and HUDOC. However, the main barrier inhibiting legal scholars from analysing this information is lack of training in data analytics. Legal analytics software can mitigate this problem to an extent. However, current systems are dominated by the commercial sector. In addition, most systems focus on search of legal information but do not facilitate advanced visualisation and analytics. Finally, free to use systems that do provide such features are either too complex to use for general legal scholars, or are not rich enough in their analytics tools. In this paper, we motivate the case for building a software platform that addresses these limitations. Such software can provide a powerful platform for visualising and exploring connections and correlations in EU case law, helping to unravel the “DNA” behind EU legal systems. It will also serve to train researchers and students in schools and universities to analyse legal information using state-of-the-art methods in data science, without requiring technical proficiency in the underlying methods. We also suggest that the software should be powered by a data infrastructure and management paradigm following the seminal FAIR (Findable, Accessible, Interoperable and Reusable) principles.

Download Full-text

Population Data BC: Supporting population data science in British Columbia

International Journal for Population Data Science ◽

10.23889/ijpds.v4i2.1133 ◽

2020 ◽

Vol 4 (2) ◽

Author(s):

Tavinder Kaur Ark ◽

Sarah Kesselring ◽

Brent Hills ◽

Kim McGrail

Keyword(s):

Linked Data ◽

Large Scale ◽

Data Science ◽

Data Linkage ◽

Population Data ◽

Cost Effective ◽

Data Access ◽

Well Being ◽

Third Party ◽

Individual Level

BackgroundPopulation Data BC (PopData) was established as a multi-university data and education resourceto support training and education, data linkage, and access to individual level, de-identified data forresearch in a wide variety of areas including human and community development and well-being. ApproachA combination of deterministic and probabilistic linkage is conducted based on the quality andavailability of identifiers for data linkage. PopData utilizes a harmonized data request and approvalprocess for data stewards and researchers to increase efficiency and ease of access to linked data.Researchers access linked data through a secure research environment (SRE) that is equipped witha wide variety of tools for analysis. The SRE also allows for ongoing management and control ofdata. PopData continues to expand its data holdings and to evolve its services as well as governanceand data access process. DiscussionPopData has provided efficient and cost-effective access to linked data sets for research. After twodecades of learning, future planned developments for the organization include, but are not limitedto, policies to facilitate programs of research, access to reusable datasets, evaluation and use of newdata linkage techniques such as privacy preserving record linkage (PPRL). ConclusionPopData continues to maintain and grow the number and type of data holdings available for research.Its existing models support a number of large-scale research projects and demonstrate the benefitsof having a third-party data linkage and provisioning center for research purposes. Building furtherconnections with existing data holders and governing bodies will be important to ensure ongoingaccess to data and changes in policy exist to facilitate access for researchers.

Download Full-text

LINKED DATA VIEWING AS PART OF THE SPATIAL DATA PLATFORM OF THE FUTURE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w8-49-2018 ◽

2018 ◽

Vol XLII-4/W8 ◽

pp. 49-52 ◽

Cited By ~ 4

Author(s):

E. Folmer ◽

W. Beek ◽

L. Rietveld

Keyword(s):

The Netherlands ◽

Spatial Data ◽

Online Publication ◽

Linked Data ◽

Open Data ◽

Linked Open Data ◽

Geospatial Data ◽

Sparql Endpoint ◽

Data Platform ◽

Main Components

<p><strong>Abstract.</strong> The Land Registry and Mapping Agency of the Netherlands (‘Kadaster’ in Dutch) is developing an online publication platform for sharing its geospatial data assets called KDP (`Kadaster Data Platform’ in Dutch). One of the main goals of this platform is to better share geospatial data with the wider, web-oriented world, including its developers, approaches, and standards. Linked Open Data (W3C), GeoSPARQL (OGC), and Open APIs (OpenAPI Specification) are the predominant standardized approaches for this purpose. As a result, the most important spatial datasets of the Netherlands – including several key registries – are now being published as Linked Open Data that can be accessed through a SPARQL endpoint and a collection of REST APIs. In addition to providing raw access to the data, Kadaster Data Platform also offers developers functionalities that allow them to gain a better understanding about the contents of its datasets. These functionalities include various ways for viewing Linked Data . This paper focuses on two of the main components the Kadaster Data Platform is using for this purpose: FacetCheck and Data Stories.</p>

Download Full-text

Linked Data

Advances in Geospatial Technologies - Geospatial Web Services ◽

10.4018/978-1-60960-192-8.ch009 ◽

2011 ◽

pp. 189-226 ◽

Cited By ~ 3

Author(s):

Carlos Granell ◽

Sven Schade ◽

Gobe Hobona

Keyword(s):

Spatial Data ◽

Linked Data ◽

Best Practice ◽

Third Generation ◽

Data Infrastructure ◽

Shared Information ◽

Service Oriented ◽

User Centric ◽

The Moment ◽

Implementing Strategies

A Spatial Data Infrastructure (SDI) is an information infrastructure for enhancing geospatial data sharing and access. At the moment, the service-oriented second generation of SDI is transitioning to a third generation, which is characterized by user-centric approaches. This new movement closes the gap between classical SDI and user contributed content, also known as Volunteered Geographic Information (VGI). Public use and acquisition of information provides additional challenges within and beyond the geospatial domain. Linked Data has been suggested recently as a possible overall solution. This notion refers to a best practice for exposing, sharing, and connecting resources in the (Semantic) Web. This chapter details the Linked Data approach to SDI and suggests it as a possibility to combine SDI with VGI. Thus, a Spatial Linked Data Infrastructure could apply solutions for Linked Data to classical SDI standards. The chapter highlights different implementing strategies, gives examples, and argues for benefits, while at the same time trying to outline possible fallbacks; hopeful this contribution will enlighten a way towards a single shared information space.

Download Full-text

Locational privacy-preserving distance computations with intersecting sets of randomly labeled grid points

International Journal of Health Geographics ◽

10.1186/s12942-021-00268-y ◽

2021 ◽

Vol 20 (1) ◽

Author(s):

Rainer Schnell ◽

Jonas Klingwort ◽

James M. Farrow

Keyword(s):

Programming Languages ◽

Real World ◽

Spatial Data ◽

Large Scale ◽

Privacy Preserving ◽

Real World Data ◽

Data Set ◽

Additional Information ◽

Grid Points ◽

High Level

Abstract Background We introduce and study a recently proposed method for privacy-preserving distance computations which has received little attention in the scientific literature so far. The method, which is based on intersecting sets of randomly labeled grid points, is henceforth denoted as ISGP allows calculating the approximate distances between masked spatial data. Coordinates are replaced by sets of hash values. The method allows the computation of distances between locations L when the locations at different points in time t are not known simultaneously. The distance between $$L_1$$ L 1 and $$L_2$$ L 2 could be computed even when $$L_2$$ L 2 does not exist at $$t_1$$ t 1 and $$L_1$$ L 1 has been deleted at $$t_2$$ t 2 . An example would be patients from a medical data set and locations of later hospitalizations. ISGP is a new tool for privacy-preserving data handling of geo-referenced data sets in general. Furthermore, this technique can be used to include geographical identifiers as additional information for privacy-preserving record-linkage. To show that the technique can be implemented in most high-level programming languages with a few lines of code, a complete implementation within the statistical programming language R is given. The properties of the method are explored using simulations based on large-scale real-world data of hospitals ($$n=850$$ n = 850 ) and residential locations ($$n=13,000$$ n = 13 , 000 ). The method has already been used in a real-world application. Results ISGP yields very accurate results. Our simulation study showed that—with appropriately chosen parameters – 99 % accuracy in the approximated distances is achieved. Conclusion We discussed a new method for privacy-preserving distance computations in microdata. The method is highly accurate, fast, has low computational burden, and does not require excessive storage.

Download Full-text

TerraBrasilis: A Spatial Data Analytics Infrastructure for Large-Scale Thematic Mapping

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8110513 ◽

2019 ◽

Vol 8 (11) ◽

pp. 513 ◽

Cited By ~ 10

Author(s):

Luiz Fernando F. G. Assis ◽

Karine Reis Ferreira ◽

Lubia Vinhas ◽

Luis Maurano ◽

Claudio Almeida ◽

...

Keyword(s):

Spatial Data ◽

Web Application ◽

Data Analytics ◽

Large Scale ◽

Computer Network ◽

Service Architecture ◽

Data Infrastructure ◽

Spatio Temporal ◽

Spatial Interfaces ◽

The Government

The physical phenomena derived from an analysis of remotely sensed imagery provide a clearer understanding of the spectral variations of a large number of land use and cover (LUC) classes. The creation of LUC maps have corroborated this view by enabling the scientific community to estimate the parameter heterogeneity of the Earth’s surface. Along with descriptions of features and statistics for aggregating spatio-temporal information, the government programs have disseminated thematic maps to further the implementation of effective public policies and foster sustainable development. In Brazil, PRODES and DETER have shown that they are committed to monitoring the mapping areas of large-scale deforestation systematically and by means of data quality assurance. However, these programs are so complex that they require the designing, implementation and deployment of a spatial data infrastructure based on extensive data analytics features so that users who lack a necessary understanding of standard spatial interfaces can still carry out research on them. With this in mind, the Brazilian National Institute for Space Research (INPE) has designed TerraBrasilis, a spatial data analytics infrastructure that provides interfaces that are not only found within traditional geographic information systems but also in data analytics environments with complex algorithms. To ensure it achieved its best performance, we leveraged a micro-service architecture with virtualized computer resources to enable high availability, lower size, simplicity to produce an increment, reliable to change and fault tolerance in unstable computer network scenarios. In addition, we tuned and optimized our databases both to adjust to the input format of complex algorithms and speed up the loading of the web application so that it was faster than other systems.

Download Full-text

Development of framework for aggregation and visualization of three-dimensional (3D) spatial data

10.32920/14638815 ◽

2021 ◽

Author(s):

Mihal Miu ◽

Xiaokun Zhang ◽

M. Ali Akber Dewan ◽

Junye Wang

Keyword(s):

Spatial Data ◽

Large Scale ◽

Spatial Information ◽

Three Dimensional ◽

Geospatial Data ◽

Third Party ◽

Data Sources ◽

Multidimensional Data ◽

Multiple Sources ◽

Geospatial Information

Geospatial information plays an important role in environmental modelling, resource management, business operations, and government policy. However, very little or no commonality between formats of various geospatial data has led to difficulties in utilizing the available geospatial information. These disparate data sources must be aggregated before further extraction and analysis may be performed. The objective of this paper is to develop a framework called PlaniSphere, which aggregates various geospatial datasets, synthesizes raw data, and allows for third party customizations of the software. PlaniSphere uses NASA World Wind to access remote data and map servers using Web Map Service (WMS) as the underlying protocol that supports service-oriented architecture (SOA). The results show that PlaniSphere can aggregate and parses files that reside in local storage and conforms to the following formats: GeoTIFF, ESRI shape files, and KML. Spatial data retrieved using WMS from the Internet can create geospatial data sets (map data) from multiple sources, regardless of who the data providers are. The plug-in function of this framework can be expanded for wider uses, such as aggregating and fusing geospatial data from different data sources, by providing customizations to serve future uses, which the capacity of the commercial ESRI ArcGIS software is limited to add libraries and tools due to its closed-source architectures and proprietary data structures. Analysis and increasing availability of geo-referenced data may provide an effective way to manage spatial information by using large-scale storage, multidimensional data management, and Online Analytical Processing (OLAP) capabilities in one system.

Download Full-text