Link maintenance for integrity in linked open data evolution: Literature survey and open challenges

Semantic Web ◽  
2020 ◽  
pp. 1-25
Author(s):  
Andre Gomes Regino ◽  
Julio Cesar dos Reis ◽  
Rodrigo Bonacin ◽  
Ahsan Morshed ◽  
Timos Sellis

RDF data has been extensively deployed describing various types of resources in a structured way. Links between data elements described by RDF models stand for the core of Semantic Web. The rising amount of structured data published in public RDF repositories, also known as Linked Open Data, elucidates the success of the global and unified dataset proposed by the vision of the Semantic Web. Nowadays, semi-automatic algorithms build connections among these datasets by exploring a variety of methods. Interconnected open data demands automatic methods and tools to maintain their consistency over time. The update of linked data is considered as key process due to the evolutionary characteristic of such structured datasets. However, data changing operations might influence well-formed links, which turns difficult to maintain the consistencies of connections over time. In this article, we propose a thorough survey that provides a systematic review of the state of the art in link maintenance in linked open data evolution scenario. We conduct a detailed analysis of the literature for characterising and understanding methods and algorithms responsible for detecting, fixing and updating links between RDF data. Our investigation provides a categorisation of existing approaches as well as describes and discusses existing studies. The results reveal an absence of comprehensive solutions suited to fully detect, warn and automatically maintain the consistency of linked data over time.

2019 ◽  
Vol 19 (01) ◽  
pp. e05
Author(s):  
Marcos daniel Zarate ◽  
Carlos Buckle ◽  
Renato Mazzanti ◽  
Gustavo Samec

Scientific publication services are changing drastically, researchers demand intelligent search services to discover and relate scientific publications. Publishersneed to incorporate semantic information to better organize their digital assets and make publications more discoverable. In this paper, we present the on-going work to publish a subset of scientific publications of CONICET Digital as Linked Open Data. The objective of this work is to improve the recovery andreuse of data through Semantic Web technologies and Linked Data in the domain of scientific publications.To achieve these goals, Semantic Web standards and reference RDF schema’s have been taken into account (Dublin Core, FOAF, VoID, etc.). The conversion and publication process is guided by the methodological guidelines for publishing government linked data. We also outline how these data can be linked to other datasets DBLP, WIKIDATA and DBPEDIA on the web of data. Finally, we show some examples of queries that answer questions that initially CONICET Digital does not allow


Author(s):  
Caio Saraiva Coneglian ◽  
José Eduardo Santarem Segundo

O surgimento de novas tecnologias, tem introduzido meios para a divulgação e a disponibilização das informações mais eficientemente. Uma iniciativa, chamada de Europeana, vem promovendo esta adaptação dos objetos informacionais dentro da Web, e mais especificamente no Linked Data. Desta forma, o presente estudo tem como objetivo apresentar uma discussão acerca da relação entre as Humanidades Digitais e o Linked Open Data, na figura da Europeana. Para tal, utilizamos uma metodologia exploratória e que busca explorar as questões relacionadas ao modelo de dados da Europeana, EDM, por meio do SPARQL. Como resultados, compreendemos as características do EDM, pela utilização do SPARQL. Identificamos, ainda, a importância que o conceito de Humanidades Digitais possui dentro do contexto da Europeana.Palavras-chave: Web semântica. Linked open data. Humanidades digitais. Europeana. EDM.Link: https://periodicos.ufsc.br/index.php/eb/article/view/1518-2924.2017v22n48p88/33031


Author(s):  
Tim Berners-Lee ◽  
Kieron O’Hara

This paper discusses issues that will affect the future development of the Web, either increasing its power and utility, or alternatively suppressing its development. It argues for the importance of the continued development of the Linked Data Web, and describes the use of linked open data as an important component of that. Second, the paper defends the Web as a read–write medium, and goes on to consider how the read–write Linked Data Web could be achieved.


2018 ◽  
Vol 52 (7) ◽  
pp. 548-564
Author(s):  
Susanne Al-Eryani ◽  
Gudrun Bucher ◽  
Stefanie Rühle

Zusammenfassung Im Rahmen des DFG-geförderten Projekts „Entwicklung von interoperablen Standards für die Kontextualisierung heterogener Objekte am Beispiel der Provenienz Asch“ wurde ein Semantic Web und Linked Open Data fähiges Metadatenmodell entwickelt, das es ermöglicht, institutionsübergreifend Kulturerbe und dessen Provenienz zu kontextualisieren.


Author(s):  
Lyubomir Penev ◽  
Teodor Georgiev ◽  
Viktor Senderov ◽  
Mariya Dimitrova ◽  
Pavel Stoev

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.


2022 ◽  
Vol 59 (2(118)) ◽  
pp. 7-25
Author(s):  
Dorota Siwecka

Purpose/Thesis: This article presents the results of a survey conducted in January 2021 among employees of Polish libraries, museums, and archives, examining their awareness of open linked data technologies. The research had a pilot character and its results will be used to improve the questionnaire and to conduct research on a wider scale. Approach/Methods: The survey method was used in the study. Results and conclusions: On the basis of answers received, it can be concluded that open linked data is not yet very well-known among employees of Polish libraries, museums, and archives. Those most aware of technologies allowing for machine understanding of content shared on the Web are doctorate degree-holders employed in research libraries. Furthermore, awareness of the projects using LOD technologies does not correlate with awareness of these technological solutions. Research limitations: The number of respondents (415) constitutes 1% of all the people employed in libraries, archives, and museums in Poland (based on data provided by the Central Statistical Office of Poland). This is not a large number, but considering the variety among the respondents, the sample can be considered representative. Originality/Value: The awareness of Linked Open Data among employees of Polish libraries, archives, and museums has not been the subject of any study so far. In fact, this type of research has not been conducted in other countries either.


Knygotyra ◽  
2013 ◽  
Vol 61 ◽  
pp. 254-277
Author(s):  
MARIJANA TOMIĆ ◽  
MIRNA WILLER

Rankraščių rinkiniai – tai labai įvairaus pobūdžio rankraščiai, paprastai apibrėžiami kaip „ranka ant popieriaus arba pergamento užrašytas tekstas arba dokumentas“ (Peter Beal). Tai gali būti šeimos ar asmeniniai dokumentai, dienoraščiai, laiškai, archyvų rinkiniai ir kt. Viduramžių rankraščiai – kodeksai, žemėlapiai, muzikos kūriniai arba jų fragmentai – sudaro specialią rankraš­čių rūšį. Kaip ir inkunabulai, rankraščių rinkiniai yra vertingiausia bibliotekų paveldo dalis, dėl jų mus pasiekia itin daug informacijos apie viduramžių istoriją, kultūrą, literatūrą, socialinę istoriją, gyvenimo tendencijas. Be šių šaltinių informacija būtų dingusi. Senų ir retų rankraščių tyri­mai svarbūs tiek šalies, tiek visos Europos kultūros ir socialinei istorijai. Žvelgiant iš humanitarinių mokslų perspektyvos, būtina išskirti keletą veiksnių, kurie lėmė reikšmingus pokyčius tyrinėjant rankraščius ir pirmąsias spausdintines knygas. Pa­čiu svarbiausiu laikomas informacinių technologijų poveikis beveik visoms tyrimo sritims. Šie pokyčiai lėmė ir naujos disciplinos – skaitmeninių humanitarinių mokslų atsiradimą. Pasak Toby’o Burrowso, viduramžių tyrinėtojai yra „pažangiausi skaitmeninių technologi­jų taikymo humanitarinių mokslų tyrimuose atstovai“. Vis dėlto T. Burrowsas išskiria ir keletą keblumų, susijusių su interneto ir skaitmeninės bibliotekos paslaugomis. Jis nurodo „integracijos ir sąveikos tarp daugybės skirtingų interneto svetainių stygių“ bei terminolo­gijos nenuoseklumą taikant aprašomuosius standartus. Savo ruožtu tai sukelia probleminę situaciją, nes „tyrinėtojams visame pasaulyje kyla daug sunkumų rasti, naudotis ir dalytis žiniomis apie viduramžių rankraščių kolekcijas“. Visiškai pritariame T. Burrowso minčiai, kad šią problemą galima išspręsti sukuriant tarptautinę bendradarbiavimo infrastruktūrą, kuri leistų tvarkyti turinį ir tarpusavyje susijusias žinias. Mūsų nuomone, ši infrastuktūra gali būti įgyvendinta technologinėje semantinio žiniatinklio ir sujungtų atvirų duomenų (angl. Semantic Web and Linked Open Data) terpėje. Straipsnyje aptariami viduramžių rankraščių ir inkunabulų bei jų fragmentų tyrimai ir šių šal­tinių aprašymas kaip skaitmeninių humanitarinių mokslų projekto dalis, taikant šią naują tech­nologiją. Nagrinėjamas šios srities Kroatijos Zadaro universiteto Informacijos mokslų fakulteto vykdomas mokslinių tyrimų projektas. Projekto tikslas – atrinkti duomenų elementus, reikalingus tiksliam minėtų šaltinių aprašymui ir jų standartizavimui, naudojant senų ir retų knygų tyrinėtojų parengtas bibliografijos, kodikologijos, paleografijos bei tipografijos ontologijas.Straipsnyje pateikiamas ir trumpas technologinės semantinio tinklo infrastruktūros bei jo standartų įvadas. Detaliai aprašoma metodika, padedanti paskelbti pasirinktą žodyną kaip vieną iš metaduomenų registro paslaugų. Pateikiamas sujungtų atvirų duomenų paskelbimo pavyzdys – pri­statatomas grafikas, vaizduojantis iš dalies rekonstruoto rankraščio fragmento aprašymą. Kadangi visos minėtos disciplinos naudoja savo žodynus ir ontologijas, straipsnio autorės siūlo orientuotis ne į vieno bendro žodyno naudojimą, o į atitinkamų terminų sąsajų projektavimą vadovaujantis SKOS taisyklėmis. Taip būtų kuriami būsimos tarptautinės bendradarbiavimo struktūros pagrindai.


Author(s):  
Jose María Alvarez Rodríguez ◽  
Jules Clement ◽  
José Emilio Labra Gayo ◽  
Hania Farhan ◽  
Patricia Ordoñez de Pablos

This chapter introduces the promotion of statistical data to the Linked Open Data initiative in the context of the Web Index project. A framework for the publication of raw statistics and a method to convert them to Linked Data are also presented following the W3C standards RDF, SKOS, and OWL. This case study is focused on the Web Index project; launched by the Web Foundation, the Index is the first multi-dimensional measure of the growth, utility, and impact of the Web on people and nations. Finally, an evaluation of the advantages of using Linked Data to publish statistics is also presented in conjunction with a discussion and future steps sections.


Author(s):  
Jose María Alvarez Rodríguez ◽  
José Emilio Labra Gayo ◽  
Patricia Ordoñez de Pablos

The aim of this chapter is to present a proposal and a case study to describe the information about organizations in a standard way using the Linked Data approach. Several models and ontologies have been provided in order to formalize the data, structure and behaviour of organizations. Nevertheless, these tries have not been fully accepted due to some factors: (1) missing pieces to define the status of the organization; (2) tangled parts to specify the structure (concepts and relations) between the elements of the organization; 3) lack of text properties, and other factors. These divergences imply a set of incomplete approaches to formalize data and information about organizations. Taking into account the current trends of applying semantic web technologies and linked data to formalize, aggregate, and share domain specific information, a new model for organizations taking advantage of these initiatives is required in order to overcome existing barriers and exploit the corporate information in a standard way. This work is especially relevant in some senses to: (1) unify existing models to provide a common specification; (2) apply semantic web technologies and the Linked Data approach; (3) provide access to the information via standard protocols, and (4) offer new services that can exploit this information to trace the evolution and behaviour of the organization over time. Finally, this work is interesting to improve the clarity and transparency of some scenarios in which organizations play a key role, like e-procurement, e-health, or financial transactions.


Author(s):  
Axel Polleres ◽  
Simon Steyskal

The World Wide Web Consortium (W3C) as the main standardization body for Web standards has set a particular focus on publishing and integrating Open Data. In this chapter, the authors explain various standards from the W3C's Semantic Web activity and the—potential—role they play in the context of Open Data: RDF, as a standard data format for publishing and consuming structured information on the Web; the Linked Data principles for interlinking RDF data published across the Web and leveraging a Web of Data; RDFS and OWL to describe vocabularies used in RDF and for describing mappings between such vocabularies. The authors conclude with a review of current deployments of these standards on the Web, particularly within public Open Data initiatives, and discuss potential risks and challenges.


Sign in / Sign up

Export Citation Format

Share Document