Standardised Globally Unique Specimen Identifiers

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26658 ◽

2018 ◽

Vol 2 ◽

pp. e26658 ◽

Cited By ~ 2

Author(s):

Anton Güntsch ◽

Quentin Groom ◽

Roger Hyam ◽

Simon Chagnoux ◽

Dominik Röpert ◽

...

Keyword(s):

Open Data ◽

Source Information ◽

New Era ◽

Semantic Inference ◽

Description Framework ◽

Data Objects ◽

Machine Readable ◽

Resource Description ◽

Local Identifier ◽

Correct Implementation

A simple, permanent and reliable specimen identifier system is needed to take the informatics of collections into a new era of interoperability. A system of identifiers based on HTTP URI (Uniform Resource Identifiers), endorsed by the Consortium of European Taxonomic Facilities (CETAF), has now been rolled out to 14 member organisations (Güntsch et al. 2017). CETAF-Identifiers have a Linked Open Data redirection mechanism for both human- and machine-readable access and, if fully implemented, provide Resource Description Framework (RDF) -encoded specimen data following best practices continuously improved by members of the initiative. To date, more than 20 million physical collection objects have been equipped with CETAF Identifiers (Groom et al. 2017). To facilitate the implementation of stable identifiers, simple redirection scripts and guidelines for deciding on the local identifier syntax have been compiled (http://cetafidentifiers.biowikifarm.net/wiki/Main_Page). Furthermore, a capable "CETAF Specimen URI Tester" (http://herbal.rbge.info/) provides an easy-to-use service for testing whether the existing identifiers are operational. For the usability and potential of any identifier system associated with evolving data objects, active links to the source information are critically important. This is particularly true for natural history collections facing the next wave of industrialised mass digitisation, where specimens come online with only basic, but rapidly evolving label data. Specimen identifier systems must therefore have components for monitoring the availability and correct implementation of individual data objects. Our next implementation steps will involve the development of a "Semantic Specimen Catalogue", which has a list of all existing specimen identifiers together with the latest RDF metadata snapshot. The catalogue will be used for semantic inference across collections as well as the basis for periodic testing of identifiers.

Download Full-text

On the Graph Structure of the Web of Data

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2018040104 ◽

2018 ◽

Vol 14 (2) ◽

pp. 70-85 ◽

Cited By ~ 1

Author(s):

Alberto Nogales Moyano ◽

Miguel Angel Sicilia ◽

Elena Garcia Barriocanal

Keyword(s):

Open Data ◽

Graph Structure ◽

Principal Mechanism ◽

Web Of Data ◽

Description Framework ◽

The One ◽

Machine Readable ◽

Resource Description ◽

The Web ◽

Bow Tie

This article describes how the Web of Data has emerged as the realization of a machine readable web relying on the resource description framework language as a way to provide richer semantics to datasets. While the web of data is based on similar principles as the original web, being interlinked in the principal mechanism to relate information, the differences in the structure of the information is evident. Several studies have analysed the graph structure of the web, yielding important insights that were used in relevant applications. However, those findings cannot be transposed to the Web of Data, due to fundamental differences in the production, link creation and usage. This article reports on a study of the graph structure of the Web of Data using methods and techniques from similar studies for the Web. Results show that the Web of Data also complies with the theory of the bow-tie. Other characteristics are the low distance between nodes or the closeness and degree centrality are low. Regarding the datasets, the biggest one is Open Data Euskadi but the one with more connections to other datasets is Dbpedia.

Download Full-text

Characterising RDF data sets

Journal of Information Science ◽

10.1177/0165551516677945 ◽

2017 ◽

Vol 44 (2) ◽

pp. 203-229 ◽

Cited By ~ 6

Author(s):

Javier D Fernández ◽

Miguel A Martínez-Prieto ◽

Pablo de la Fuente Redondo ◽

Claudio Gutiérrez

Keyword(s):

Data Structures ◽

Large Scale ◽

Open Data ◽

Structural Features ◽

Data Sets ◽

Data Set ◽

Wide Range ◽

Rdf Data ◽

Description Framework ◽

Resource Description

The publication of semantic web data, commonly represented in Resource Description Framework (RDF), has experienced outstanding growth over the last few years. Data from all fields of knowledge are shared publicly and interconnected in active initiatives such as Linked Open Data. However, despite the increasing availability of applications managing large-scale RDF information such as RDF stores and reasoning tools, little attention has been given to the structural features emerging in real-world RDF data. Our work addresses this issue by proposing specific metrics to characterise RDF data. We specifically focus on revealing the redundancy of each data set, as well as common structural patterns. We evaluate the proposed metrics on several data sets, which cover a wide range of designs and models. Our findings provide a basis for more efficient RDF data structures, indexes and compressors.

Download Full-text

Linked Open Statistical Metadata

Data Visualization and Statistical Literacy for Open and Big Data - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2512-7.ch012 ◽

2017 ◽

pp. 297-320

Author(s):

Franck Cotton ◽

Daniel Gillman

Keyword(s):

Semantic Web ◽

Resource Description Framework ◽

Statistical Data ◽

Open Data ◽

Statistical Analyses ◽

Statistical Literacy ◽

Statistical Process ◽

Description Framework ◽

Resource Description ◽

The Web

Linked Open Statistical Metadata (LOSM) is Linked Open Data (LOD) applied to statistical metadata. LOD is a model for identifying, structuring, interlinking, and querying data published directly on the web. It builds on the standards of the semantic web defined by the W3C. LOD uses the Resource Description Framework (RDF), a simple data model expressing content as predicates linking resources between them or with literal properties. The simplicity of the model makes it able to represent any data, including metadata. We define statistical data as data produced through some statistical process or intended for statistical analyses, and statistical metadata as metadata describing statistical data. LOSM promotes discovery and the meaning and structure of statistical data in an automated way. Consequently, it helps with understanding and interpreting data and preventing inadequate or flawed visualizations for statistical data. This enhances statistical literacy and efforts at visualizing statistics.

Download Full-text

PROVIDING GEOGRAPHIC DATASETS AS LINKED DATA IN SDI

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b2-583-2016 ◽

2016 ◽

Vol XLI-B2 ◽

pp. 583-586 ◽

Cited By ~ 5

Author(s):

E. Hietanen ◽

L. Lehto ◽

P. Latvala

Keyword(s):

Information Content ◽

Linked Data ◽

Web Browser ◽

Data Format ◽

Spatial Objects ◽

Ontology Language ◽

Geographic Datasets ◽

Description Framework ◽

Data Objects ◽

Resource Description

In this study, a prototype service to provide data from Web Feature Service (WFS) as linked data is implemented. At first, persistent and unique Uniform Resource Identifiers (URI) are created to all spatial objects in the dataset. The objects are available from those URIs in Resource Description Framework (RDF) data format. Next, a Web Ontology Language (OWL) ontology is created to describe the dataset information content using the Open Geospatial Consortium’s (OGC) GeoSPARQL vocabulary. The existing data model is modified in order to take into account the linked data principles. The implemented service produces an HTTP response dynamically. The data for the response is first fetched from existing WFS. Then the Geographic Markup Language (GML) format output of the WFS is transformed on-the-fly to the RDF format. Content Negotiation is used to serve the data in different RDF serialization formats. This solution facilitates the use of a dataset in different applications without replicating the whole dataset. In addition, individual spatial objects in the dataset can be referred with URIs. Furthermore, the needed information content of the objects can be easily extracted from the RDF serializations available from those URIs. <br><br> A solution for linking data objects to the dataset URI is also introduced by using the Vocabulary of Interlinked Datasets (VoID). The dataset is divided to the subsets and each subset is given its persistent and unique URI. This enables the whole dataset to be explored with a web browser and all individual objects to be indexed by search engines.

Download Full-text

The health care and life sciences community profile for dataset descriptions

10.7287/peerj.preprints.1982v1 ◽

2016 ◽

Author(s):

Michel Dumontier ◽

Alasdair J G Gray ◽

M. Scott Marshall ◽

Vladimir Alexiev ◽

Peter Ansell ◽

...

Keyword(s):

Health Care ◽

Life Sciences ◽

Scientific Data ◽

Functional Requirements ◽

High Quality ◽

Community Profile ◽

Value Sets ◽

Description Framework ◽

Machine Readable ◽

Resource Description

Access to consistent, high-quality metadata is critical to finding, understanding, and reusing scientific data. However, while there are many relevant vocabularies for the annotation of a dataset, none sufficiently captures all the necessary metadata. This prevents uniform indexing and querying of dataset repositories. Towards providing a practical guide for producing a high quality description of biomedical datasets, the W3C Semantic Web for Health Care and the Life Sciences Interest Group (HCLSIG) identified Resource Description Framework (RDF) vocabularies that could be used to specify common metadata elements and their value sets. The resulting guideline covers elements of description, identification, attribution, versioning, provenance, and content summarization. This guideline reuses existing vocabularies, and is intended to meet key functional requirements including indexing, discovery, exchange, query, and retrieval of datasets, thereby enabling the publication of FAIR data. The resulting metadata profile is generic and could be used by other domains with an interest in providing machine readable descriptions of versioned datasets.

Download Full-text

A Survey of Geospatial Semantic Web for Cultural Heritage

Heritage ◽

10.3390/heritage2020093 ◽

2019 ◽

Vol 2 (2) ◽

pp. 1471-1498 ◽

Cited By ~ 3

Author(s):

Ikrom Nishanbaev ◽

Erik Champion ◽

David A. McMeekin

Keyword(s):

Semantic Web ◽

Cultural Heritage ◽

Open Source ◽

Web Based ◽

Digital Cultural Heritage ◽

New Ideas ◽

Description Framework ◽

Machine Readable ◽

Resource Description

The amount of digital cultural heritage data produced by cultural heritage institutions is growing rapidly. Digital cultural heritage repositories have therefore become an efficient and effective way to disseminate and exploit digital cultural heritage data. However, many digital cultural heritage repositories worldwide share technical challenges such as data integration and interoperability among national and regional digital cultural heritage repositories. The result is dispersed and poorly-linked cultured heritage data, backed by non-standardized search interfaces, which thwart users’ attempts to contextualize information from distributed repositories. A recently introduced geospatial semantic web is being adopted by a great many new and existing digital cultural heritage repositories to overcome these challenges. However, no one has yet conducted a conceptual survey of the geospatial semantic web concepts for a cultural heritage audience. A conceptual survey of these concepts pertinent to the cultural heritage field is, therefore, needed. Such a survey equips cultural heritage professionals and practitioners with an overview of all the necessary tools, and free and open source semantic web and geospatial semantic web platforms that can be used to implement geospatial semantic web-based cultural heritage repositories. Hence, this article surveys the state-of-the-art geospatial semantic web concepts, which are pertinent to the cultural heritage field. It then proposes a framework to turn geospatial cultural heritage data into machine-readable and processable resource description framework (RDF) data to use in the geospatial semantic web, with a case study to demonstrate its applicability. Furthermore, it outlines key free and open source semantic web and geospatial semantic platforms for cultural heritage institutions. In addition, it examines leading cultural heritage projects employing the geospatial semantic web. Finally, the article discusses attributes of the geospatial semantic web that require more attention, that can result in generating new ideas and research questions for both the geospatial semantic web and cultural heritage fields.

Download Full-text

OpenBiodiv: A Knowledge Graph for Literature-Extracted Linked Open Data in Biodiversity Science

Publications ◽

10.3390/publications7020038 ◽

2019 ◽

Vol 7 (2) ◽

pp. 38 ◽

Cited By ~ 4

Author(s):

Lyubomir Penev ◽

Mariya Dimitrova ◽

Viktor Senderov ◽

Georgi Zhelezov ◽

Teodor Georgiev ◽

...

Keyword(s):

Open Data ◽

Open Science ◽

Graph Database ◽

Global Biodiversity Information Facility ◽

Science Practices ◽

Biodiversity Knowledge ◽

Biodiversity Science ◽

Description Framework ◽

Semantic Publishing ◽

Resource Description

Hundreds of years of biodiversity research have resulted in the accumulation of a substantial pool of communal knowledge; however, most of it is stored in silos isolated from each other, such as published articles or monographs. The need for a system to store and manage collective biodiversity knowledge in a community-agreed and interoperable open format has evolved into the concept of the Open Biodiversity Knowledge Management System (OBKMS). This paper presents OpenBiodiv: An OBKMS that utilizes semantic publishing workflows, text and data mining, common standards, ontology modelling and graph database technologies to establish a robust infrastructure for managing biodiversity knowledge. It is presented as a Linked Open Dataset generated from scientific literature. OpenBiodiv encompasses data extracted from more than 5000 scholarly articles published by Pensoft and many more taxonomic treatments extracted by Plazi from journals of other publishers. The data from both sources are converted to Resource Description Framework (RDF) and integrated in a graph database using the OpenBiodiv-O ontology and an RDF version of the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Through the application of semantic technologies, the project showcases the value of open publishing of Findable, Accessible, Interoperable, Reusable (FAIR) data towards the establishment of open science practices in the biodiversity domain.

Download Full-text

Application Profiles

Advances in Web Technologies and Engineering - Developing Metadata Application Profiles ◽

10.4018/978-1-5225-2221-8.ch001 ◽

2017 ◽

pp. 1-15

Author(s):

Karen Coyle

Keyword(s):

Resource Description Framework ◽

World Wide ◽

Xml Schema ◽

Dublin Core ◽

Xml Documents ◽

Framework Model ◽

The World ◽

Description Framework ◽

Machine Readable ◽

Resource Description

Application profiles fulfill similar functions to other forms of metadata documentation, such as data dictionaries. The preference is for application profiles to be machine-readable and machine-actionable, so that they can provide validation and processing instructions, not unlike XML schema does for XML documents. These goals are behind the work of the Dublin Core Metadata Initiative in the work that has been done over the last decade to develop application profiles for data that uses the Resource Description Framework model of the World Wide Web Consortium.

Download Full-text

Evaluating the quality of linked open data in digital libraries

Journal of Information Science ◽

10.1177/0165551520930951 ◽

2020 ◽

pp. 016555152093095

Author(s):

Gustavo Candela ◽

Pilar Escobar ◽

Rafael C Carrasco ◽

Manuel Marco-Such

Keyword(s):

Digital Libraries ◽

Open Data ◽

Quality Measures ◽

Linked Open Data ◽

Data Sets ◽

Design And Implementation ◽

Bibliographic Data ◽

Description Framework ◽

Resource Description

Cultural heritage institutions have recently started to share their metadata as Linked Open Data (LOD) in order to disseminate and enrich them. The publication of large bibliographic data sets as LOD is a challenge that requires the design and implementation of custom methods for the transformation, management, querying and enrichment of the data. In this report, the methodology defined by previous research for the evaluation of the quality of LOD is analysed and adapted to the specific case of Resource Description Framework (RDF) triples containing standard bibliographic information. The specified quality measures are reported in the case of four highly relevant libraries.

Download Full-text

Conversion of the English-Xhosa Dictionary for Nurses to a Linguistic Linked Data Framework

Information ◽

10.3390/info9110274 ◽

2018 ◽

Vol 9 (11) ◽

pp. 274 ◽

Cited By ~ 1

Author(s):

Frances Gillis-Webber

Keyword(s):

Data Model ◽

Linked Data ◽

Language Resources ◽

The Public ◽

Data Framework ◽

Description Framework ◽

Machine Readable ◽

Methodological Guidelines ◽

Resource Description ◽

Language Pair

The English-Xhosa Dictionary for Nurses (EXDN) is a bilingual, unidirectional printed dictionary in the public domain, with English and isiXhosa as the language pair. By extending the digitisation efforts of EXDN from a human-readable digital object to a machine-readable state, using Resource Description Framework (RDF) as the data model, semantically interoperable structured data can be created, thus enabling EXDN’s data to be reused, aggregated and integrated with other language resources, where it can serve as a potential aid in the development of future language resources for isiXhosa, an under-resourced language in South Africa. The methodological guidelines for the construction of a Linguistic Linked Data framework (LLDF) for a lexicographic resource, as applied to EXDN, are described, where an LLDF can be defined as a framework: (1) which describes data in RDF, (2) using a model designed for the representation of linguistic information, (3) which adheres to Linked Data principles, and (4) which supports versioning, allowing for change. The result is a bidirectional lexicographic resource, previously bounded and static, now unbounded and evolving, with the ability to extend to multilingualism.

Download Full-text