scholarly journals Linked Open Data for Taxonomic Databases: The Nordic/Baltic implementation

Author(s):  
Johan Liljeblad ◽  
Tapani Lahti

Starting with Finland and Sweden and a subset of taxonomic groups, the Nordic/Baltic countries are connecting national checklists using Linked Open Data standards (Auer et al. 2007) and agreed vocabularies. We use HTTP Uniform Resource Identifiers as globally unique, persistent identifiers for taxon concepts (Chawuthai et al. 2013). Currently, we provide both human-readable (html) and machine-readable (xml) responses for client requests via a central checklist, TAXONID.ORG, which in itself needs to be managed. However, we hope this can be replaced by Catalogue of Life Plus in a not too distant future. While initially exchanging taxonomic information, our goal is ultimately to share information also on genetics, images and traits as well as conservation status and observations in a standardized way. The work is part of the NeIC DeepDive project which is funded by the Nordic e-Infrastructure Collaboration (neic.no/deepdive). The vision is to establish a regional infrastructure network consisting of Nordic and Baltic data centers and information systems and to provide seamlessly operating regional data services, tools, and virtual laboratories.

Author(s):  
Khadidja Bouchelouche ◽  
Abdessamed Réda Ghomari ◽  
Leila Zemmouchi-Ghomari

Open Government Data (OGD) is a movement that has spread worldwide, enabling the publication of thousands of datasets on the Web, aiming to concretize transparency and citizen participatory governance. This initiative can create value by linking data describing the same phenomenon from different perspectives using the traditional Web and semantic web technologies. A framework of these technologies is linked data movement that guides the publication of data and their interconnection in a machine-readable means enabling automatic interpretation and exploitation. Nevertheless, Open Government Data publication as Linked Open Data (LOD) is not a trivial task due to several obstacles, such as data heterogeneity issues. Many works dealing with this transformation process have been published that need to be investigated thoroughly to deduce the general trends and the issues related to this field. The current work proposes a classification of existing methods dealing with OGD-LOD transformation and a synthesis study to highlight their main trends and challenges.


Author(s):  
Johan Liljeblad ◽  
Tapani Lahti ◽  
Matts Djos

Taxonomic information is dynamic, i.e. changes are made continuously, so scientific names are insufficient to track changes in taxon circumscription. The principles of Linked Open Data (LOD), as defined by the World Wide Web Consortium, can be applied for documenting the relationships of taxon circumscriptions over time and between checklists of taxa. In our scheme, each checklist and each taxon in the checklist is assigned a globally unique, persistent identifier. According to the LOD principles, HTTP Uniform Resource Identifiers (URIs) are used as identifiers, providing both human-readable (HTML) and machine-readable (XML) responses for client requests. Common vocabularies are needed in machine-readable responses to HTTP URIs. We use SKOS (Simple Knowledge Organization System) as a basic vocabulary for describing checklists as instances of class skos:ConceptScheme, and taxa as instances of class skos:Concept. Set relationships between taxon circumscriptions are described using the properties skos:broader and skos:narrower. Darwin Core vocabulary is used for describing taxon properties, such as scientific names, taxonomic ranks and authorship string, in the checklists. Instead of directly linking taxon circumscriptions between checklists, we define a HTTP URI for each unique circumscription. This common identifier is then mapped to internal checklist identifiers matching the circumscription using the property skos:exactMatch. For the management of the URIs, the domain name TAXONID.ORG has been registered. In a pilot study, our approach has been applied to linking taxon circumscriptions of selected taxa between the national checklists of Sweden and Finland. In the future, national checklists from other Nordic/Baltic countries (Norway, Denmark, Iceland, Estonia) can be easily linked together as well. The work is part of the NeIC DeepDive project (neic.no).


Author(s):  
Johan Liljeblad ◽  
Tapani Lahti

While the technology behind Linked Open Data is relatively straightforward, establishing and managing links between identical taxon concepts in different databases is not. Machine-matching of similar or identical names is just a start. Not only do you need a checklist with stable identifiers tied to taxon concepts rather than names, you also need to engage taxonomic experts to identify problematic names and find a way to communicate taxonomic changes over time. In the end, this means a lot of time and money, and before you commit to such an investment you also need a plan for keeping things updated. However, once these links are established and additional trait standards agreed upon, the field is open for exchange of a multitude of species information. This process is illustrated with a Nordic/Baltic example, focusing on Dyntaxa, the Swedish Taxonomic Database, also housing the Icelandic checklist.


Author(s):  
Amina Meherehera ◽  
Imane Mekideche ◽  
Leila Zemmouchi-Ghomari ◽  
Abdessamed Réda Ghomari

A large amount of data available over the Web and, in particular, the open data have, generally, heterogeneous formats and are not machine-readable. One promising solution to overcome the problems of heterogeneity and automatic interpretation is the Linked Data initiative, which aims to provide unified practices for publishing and contextually to link data on the Web, by using World Wide Web Consortium standards and the Semantic Web technologies. LinkedIn data promote the Web’s transformation from a web of documents to a web of data, ensuring that machines and software agents can interpret the semantics of data correctly and therefore infer new facts and return relevant web data search results. This paper presents an automatic generic transformation approach that manipulates several input formats of open web data to linked open data. This work aims to participate actively in the movement of publishing data compliant with linked data principles.


Author(s):  
Kingsley Okoye

Today, one of the state-of-the-art technologies that have shown its importance towards data integration and analysis is the linked open data (LOD) systems or applications. LOD constitute of machine-readable resources or mechanisms that are useful in describing data properties. However, one of the issues with the existing systems or data models is the need for not just representing the derived information (data) in formats that can be easily understood by humans, but also creating systems that are able to process the information that they contain or support. Technically, the main mechanisms for developing the data or information processing systems are the aspects of aggregating or computing the metadata descriptions for the various process elements. This is due to the fact that there has been more than ever an increasing need for a more generalized and standard definition of data (or information) to create systems capable of providing understandable formats for the different data types and sources. To this effect, this chapter proposes a semantic-based linked open data framework (SBLODF) that integrates the different elements (entities) within information systems or models with semantics (metadata descriptions) to produce explicit and implicit information based on users’ search or queries. In essence, this work introduces a machine-readable and machine-understandable system that proves to be useful for encoding knowledge about different process domains, as well as provides the discovered information (knowledge) at a more conceptual level.


Author(s):  
Caio Saraiva Coneglian ◽  
José Eduardo Santarem Segundo

O surgimento de novas tecnologias, tem introduzido meios para a divulgação e a disponibilização das informações mais eficientemente. Uma iniciativa, chamada de Europeana, vem promovendo esta adaptação dos objetos informacionais dentro da Web, e mais especificamente no Linked Data. Desta forma, o presente estudo tem como objetivo apresentar uma discussão acerca da relação entre as Humanidades Digitais e o Linked Open Data, na figura da Europeana. Para tal, utilizamos uma metodologia exploratória e que busca explorar as questões relacionadas ao modelo de dados da Europeana, EDM, por meio do SPARQL. Como resultados, compreendemos as características do EDM, pela utilização do SPARQL. Identificamos, ainda, a importância que o conceito de Humanidades Digitais possui dentro do contexto da Europeana.Palavras-chave: Web semântica. Linked open data. Humanidades digitais. Europeana. EDM.Link: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n48p88/33031


2021 ◽  
Vol 11 (5) ◽  
pp. 2405
Author(s):  
Yuxiang Sun ◽  
Tianyi Zhao ◽  
Seulgi Yoon ◽  
Yongju Lee

Semantic Web has recently gained traction with the use of Linked Open Data (LOD) on the Web. Although numerous state-of-the-art methodologies, standards, and technologies are applicable to the LOD cloud, many issues persist. Because the LOD cloud is based on graph-based resource description framework (RDF) triples and the SPARQL query language, we cannot directly adopt traditional techniques employed for database management systems or distributed computing systems. This paper addresses how the LOD cloud can be efficiently organized, retrieved, and evaluated. We propose a novel hybrid approach that combines the index and live exploration approaches for improved LOD join query performance. Using a two-step index structure combining a disk-based 3D R*-tree with the extended multidimensional histogram and flash memory-based k-d trees, we can efficiently discover interlinked data distributed across multiple resources. Because this method rapidly prunes numerous false hits, the performance of join query processing is remarkably improved. We also propose a hot-cold segment identification algorithm to identify regions of high interest. The proposed method is compared with existing popular methods on real RDF datasets. Results indicate that our method outperforms the existing methods because it can quickly obtain target results by reducing unnecessary data scanning and reduce the amount of main memory required to load filtering results.


Sign in / Sign up

Export Citation Format

Share Document