Effective Tooling for Linked Data Publishing in Scientific Research

Library organizations have enthusiastically undertaken semantic web initiatives and in particular the data publishing as linked data. Nevertheless, different surveys report the experimental nature of initiatives and the consumer difficulty in re-using data. These barriers are a hindrance for using linked datasets, as an infrastructure that enhances the library and related information services. This paper presents an approach for encoding, as a Linked Vocabulary, the “tacit” knowledge of the information system that manages the data source. The objective is the improvement of the interpretation process of the linked data meaning of published datasets. We analyzed a digital library system, as a case study, for prototyping the “semantic data management” method, where data and its knowledge are natively managed, taking into account the linked data pillars. The ultimate objective of the semantic data management is to curate the correct consumers’ interpretation of data, and to facilitate the proper re-use. The prototype defines the ontological entities representing the knowledge, of the digital library system, that is not stored in the data source, nor in the existing ontologies related to the system’s semantics. Thus we present the local ontology and its matching with existing ontologies, Preservation Metadata Implementation Strategies (PREMIS) and Metadata Objects Description Schema (MODS), and we discuss linked data triples prototyped from the legacy relational database, by using the local ontology. We show how the semantic data management, can deal with the inconsistency of system data, and we conclude that a specific change in the system developer mindset, it is necessary for extracting and “codifying” the tacit knowledge, which is necessary to improve the data interpretation process.

Download Full-text

Improving Access to the Dutch Historical Censuses with Linked Open Data

Research Data Journal for the Humanities and Social Sciences ◽

10.1163/24523666-01000010 ◽

2018 ◽

Vol 3 (1) ◽

pp. 13-26

Author(s):

Albert Meroño-Peñuela ◽

Ashkan Ashkpour ◽

Valentijn Gilissen ◽

Jan Jonker ◽

Tom Vreugdenhil ◽

...

Keyword(s):

Economic History ◽

Linked Data ◽

Census Data ◽

Social Economic ◽

Open Data ◽

Data Cube ◽

Data Publishing ◽

Database Interface ◽

Knowledge Intensive ◽

The Web

The Dutch Historical Censuses (1795–1971) contain statistics that describe almost two centuries of History in the Netherlands. These censuses were conducted once every 10 years (with some exceptions) from 1795 to 1971. Researchers have used its wealth of demographic, occupational, and housing information to answer fundamental questions in social economic history. However, accessing these data has traditionally been a time consuming and knowledge intensive task. In this paper, we describe the outcomes of the cedar project, which make access to the digitized assets of the Dutch Historical Censuses easier, faster, and more reliable. This is achieved by using the data publishing paradigm of Linked Data from the Semantic Web. We use a digitized sample of 2,288 census tables to produce a linked dataset of more than 6.8 million statistical observations. The dataset is modeled using the rdf Data Cube, Open Annotation, and prov vocabularies. The contributions of representing this dataset as Linked Data are: (1) a uniform database interface for efficient querying of census data; (2) a standardized and reproducible data harmonization workflow; and (3) an augmentation of the dataset through richer connections to related resources on the Web.

Download Full-text

IMPROVED LINKED DATA INTERACTION THROUGH AN AUTOMATIC INFORMATION ARCHITECTURE

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s0218194012400062 ◽

2012 ◽

Vol 22 (03) ◽

pp. 325-343 ◽

Cited By ~ 2

Author(s):

JOSEP MARIA BRUNETTI ◽

ROSA GIL ◽

JUAN MANUEL GIMENO ◽

ROBERTO GARCIA

Keyword(s):

User Experience ◽

Linked Data ◽

Open Data ◽

Information Architecture ◽

Data Publishing ◽

Tabular Data ◽

Automatic Information ◽

Web Of Data ◽

New Perspective ◽

The Web

Thanks to Open Data initiatives the amount of data available on the Web is rapidly increasing. Unfortunately, most of these initiatives only publish raw tabular data, which makes its analysis and reuse very difficult. Linked Data principles allow for a more sophisticated approach by making explicit both the structure and semantics of the data. However, from the user experience viewpoint, published datasets continue to be monolithic files which are completely opaque or difficult to explore by making complex semantic queries. Our objective is to facilitate the user to grasp what kind of entities are in the dataset, how they are interrelated, which are their main properties and values, etc. Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture (IA) that facilitate getting an insight of the dataset at hand. Rhizomer automatically generates navigation menus and facets based on the kinds of things in the dataset and how they are described through metadata properties and values. This tool is currently being evaluated with end users that discover a whole new perspective of the Web of Data.

Download Full-text

Semantic micro-contributions with decentralized nanopublication services

PeerJ Computer Science ◽

10.7717/peerj-cs.387 ◽

2021 ◽

Vol 7 ◽

pp. e387

Author(s):

Tobias Kuhn ◽

Ruben Taelman ◽

Vincent Emonet ◽

Haris Antonatos ◽

Stian Soiland-Reyes ◽

...

Keyword(s):

Linked Data ◽

Prior Experience ◽

Data Publishing ◽

Efficient Manner ◽

Heavy Weight ◽

Weight One ◽

Central Bottleneck ◽

User Friendly

While the publication of Linked Data has become increasingly common, the process tends to be a relatively complicated and heavy-weight one. Linked Data is typically published by centralized entities in the form of larger dataset releases, which has the downside that there is a central bottleneck in the form of the organization or individual responsible for the releases. Moreover, certain kinds of data entries, in particular those with subjective or original content, currently do not fit into any existing dataset and are therefore more difficult to publish. To address these problems, we present here an approach to use nanopublications and a decentralized network of services to allow users to directly publish small Linked Data statements through a simple and user-friendly interface, called Nanobench, powered by semantic templates that are themselves published as nanopublications. The published nanopublications are cryptographically verifiable and can be queried through a redundant and decentralized network of services, based on the grlc API generator and a new quad extension of Triple Pattern Fragments. We show here that these two kinds of services are complementary and together allow us to query nanopublications in a reliable and efficient manner. We also show that Nanobench makes it indeed very easy for users to publish Linked Data statements, even for those who have no prior experience in Linked Data publishing.

Download Full-text

Challenges as enablers for high quality linked data: Insights from the semantic publishing challenge

10.7287/peerj.preprints.2616v1 ◽

2016 ◽

Author(s):

Anastasia Dimou ◽

Sahar Vahdati ◽

Angelo Di Iorio ◽

Christoph Lange ◽

Ruben Verborgh ◽

...

Keyword(s):

Best Practices ◽

Semantic Web ◽

Linked Data ◽

Lessons Learned ◽

Data Publishing ◽

High Quality ◽

Scholarly Publications ◽

Input Dataset ◽

Definition Of ◽

Semantic Publishing

While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.

Download Full-text

Metadata management, interoperability and Linked Data publishing support for Natural History Museums

International Journal on Digital Libraries ◽

10.1007/s00799-014-0114-2 ◽

2014 ◽

Vol 14 (3-4) ◽

pp. 127-140 ◽

Cited By ~ 3

Author(s):

Giannis Skevakis ◽

Konstantinos Makris ◽

Varvara Kalokyri ◽

Polyxeni Arapi ◽

Stavros Christodoulakis

Keyword(s):

Natural History ◽

Linked Data ◽

Data Publishing ◽

Metadata Management ◽

Natural History Museums ◽

History Museums

Download Full-text

Toward sustainable publishing and querying of distributed Linked Data archives

Journal of Documentation ◽

10.1108/jd-03-2017-0040 ◽

2018 ◽

Vol 74 (1) ◽

pp. 195-222 ◽

Cited By ~ 5

Author(s):

Miel Vander Sande ◽

Ruben Verborgh ◽

Patrick Hochstenbach ◽

Herbert Van de Sompel

Keyword(s):

Linked Data ◽

Low Cost ◽

Building Blocks ◽

Data Sources ◽

Data Publishing ◽

Financial Barriers ◽

Content Type ◽

Data Archives ◽

Publishing Strategy ◽

Data Collections

Purpose The purpose of this paper is to detail a low-cost, low-maintenance publishing strategy aimed at unlocking the value of Linked Data collections held by libraries, archives and museums (LAMs). Design/methodology/approach The shortcomings of commonly used Linked Data publishing approaches are identified, and the current lack of substantial collections of Linked Data exposed by LAMs is considered. To improve on the discussed status quo, a novel approach for publishing Linked Data is proposed and demonstrated by means of an archive of DBpedia versions, which is queried in combination with other Linked Data sources. Findings The authors show that the approach makes publishing Linked Data archives easy and affordable, and supports distributed querying without causing untenable load on the Linked Data sources. Research limitations/implications The proposed approach significantly lowers the barrier for publishing, maintaining, and making Linked Data collections queryable. As such, it offers the potential to substantially grow the distributed network of queryable Linked Data sources. Because the approach supports querying without causing unacceptable load on the sources, the queryable interfaces are expected to be more reliable, allowing them to become integral building blocks of robust applications that leverage distributed Linked Data sources. Originality/value The novel publishing strategy significantly lowers the technical and financial barriers that LAMs face when attempting to publish Linked Data collections. The proposed approach yields Linked Data sources that can reliably be queried, paving the way for applications that leverage distributed Linked Data sources through federated querying.

Download Full-text

Challenges as enablers for high quality Linked Data: insights from the Semantic Publishing Challenge

PeerJ Computer Science ◽

10.7717/peerj-cs.105 ◽

2017 ◽

Vol 3 ◽

pp. e105 ◽

Cited By ~ 6

Author(s):

Anastasia Dimou ◽

Sahar Vahdati ◽

Angelo Di Iorio ◽

Christoph Lange ◽

Ruben Verborgh ◽

...

Keyword(s):

Best Practices ◽

Semantic Web ◽

Linked Data ◽

Lessons Learned ◽

Data Publishing ◽

High Quality ◽

Scholarly Publications ◽

Input Dataset ◽

Definition Of ◽

Semantic Publishing

While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation Challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.

Download Full-text

Challenges as enablers for high quality linked data: Insights from the semantic publishing challenge

10.7287/peerj.preprints.2616 ◽

2016 ◽

Author(s):

Anastasia Dimou ◽

Sahar Vahdati ◽

Angelo Di Iorio ◽

Christoph Lange ◽

Ruben Verborgh ◽

...

Keyword(s):

Best Practices ◽

Semantic Web ◽

Linked Data ◽

Lessons Learned ◽

Data Publishing ◽

High Quality ◽

Scholarly Publications ◽

Input Dataset ◽

Definition Of ◽

Semantic Publishing

While most challenges organized so far in the Semantic Web domain are focused on comparing tools with respect to different criteria such as their features and competencies, or exploiting semantically enriched data, the Semantic Web Evaluation Challenges series, co-located with the ESWC Semantic Web Conference, aims to compare them based on their output, namely the produced dataset. The Semantic Publishing Challenge is one of these challenges. Its goal is to involve participants in extracting data from heterogeneous sources on scholarly publications, and producing Linked Data that can be exploited by the community itself. This paper reviews lessons learned from both (i) the overall organization of the Semantic Publishing Challenge, regarding the definition of the tasks, building the input dataset and forming the evaluation, and (ii) the results produced by the participants, regarding the proposed approaches, the used tools, the preferred vocabularies and the results produced in the three editions of 2014, 2015 and 2016. We compared these lessons to other Semantic Web Evaluation challenges. In this paper, we (i) distill best practices for organizing such challenges that could be applied to similar events, and (ii) report observations on Linked Data publishing derived from the submitted solutions. We conclude that higher quality may be achieved when Linked Data is produced as a result of a challenge, because the competition becomes an incentive, while solutions become better with respect to Linked Data publishing best practices when they are evaluated against the rules of the challenge.

Download Full-text