Semantic Metadata Interoperability and Inference-Based Querying in Digital Repositories

2009 ◽  
Vol 2 (4) ◽  
pp. 36-52 ◽  
Author(s):  
Dimitrios A. Koutsomitropoulos ◽  
Georgia D. Solomou ◽  
Andreas D. Alexopoulos ◽  
Theodore S. Papatheodorou

Metadata applications have evolved in time into highly structured “islands of information” about digital resources, often bearing a strong semantic interpretation. Scarcely, however, are these semantics being communicated in machine readable and understandable ways. At the same time, the process for transforming the implied metadata knowledge into explicit Semantic Web descriptions can be problematic and is not always evident. In this article we take upon the well-established Dublin Core metadata standard as well as other metadata schemata, which often appear in digital repositories set-ups, and suggest a proper Semantic Web OWL ontology. In this process the authors cope with discrepancies and incompatibilities, indicative of such attempts, in novel ways. Moreover, we show the potential and necessity of this approach by demonstrating inferences on the resulting ontology, instantiated with actual metadata records. The authors conclude by presenting a working prototype that provides for inference-based querying on top of digital repositories.

Author(s):  
William Ulate ◽  
M. Marcela Mora

Annotation (i.e., making comments on a resource) is an important part of the vision for the Semantic Web as defined by the standards set by the World Wide Web Consortium (W3C). Its goal is to make Internet-published information and data, machine-readable to better utilize it. Despite the important role that annotation plays in the Semantic Web, many cultural heritage institutions have been slow to adopt it. The access to open historical biological literature hosted in digital libraries, like the Biodiversity Heritage Library (BHL), has improved the efficiency of biodiversity research, especially in the taxonomic field. This amount of information has even greater potential for research if annotation capabilities are incorporated within those legacy digital repositories. As part of the project Consumers as Creators, developed by the Missouri Botanical Garden (MOBOT) with partners at Saint Louis University (SLU), the Web annotation needs of the botanical community were analyzed. Likewise, the practicality of using existing annotation tools to satisfy this community’s particular needs was assessed, including technical and operational considerations. To do so, 15 users of a botanical virtual library from five institutions were interviewed. Their answers were analyzed and classified taking into account the user role and purpose. Desirable functionalities of annotation software were classified into three orders of priority (Must, Should, and Could). Subsequently, six open-source annotation tools were evaluated (i.e. Digilib, hypothes.is, Pundit Annotator Pro, Recogito, rerum, and VGG Annotator) to explore if they fulfilled the annotation needs of botanists. The selected annotation tools were installed (when necessary), assessed based on different functional aspects, and their advantages and disadvantages were identified. Finally, a proof-of-concept prototype was developed to exemplify how those needs could be met within a digital library platform. Botanicus, a free portal to historic botanical literature from the Peter H. Raven Library at MOBOT, and rerum, functioning as a repository of annotations, were used to explore the implementation of a minimal subset of these requirements. A summary of the results of the assessment, the lessons learned and some of the best practices recommended are presented.


2015 ◽  
Vol 3 (1) ◽  
pp. 106-115
Author(s):  
Karla Abad ◽  
Walter Orozco Iguasnia ◽  
Washington Torres ◽  
Alfredo González Tomalá

A raíz del estudio de estado actual de micrositios y repositorios en la Universidad Estatal Península de Santa Elena se encontró que su información carecía de semántica óptima y adecuada. Bajo estas circunstancias, se plantea entonces la necesidad de crear un modelo de estructura de web semántica para Universidades, el cual posteriormente fue aplicado a micrositios y repositorio digital de la UPSE, como caso de prueba. Parte de este proyecto incluye la instalación de módulos de software con sus respectivas configuraciones y la utilización de estándares de metadatos como DUBLIN CORE, para la mejora del SEO (optimización en motores de búsqueda); con ello se ha logrado la generación de metadatos estandarizados y la creación de políticas para la subida de información. El uso de metadatos transforma datos simples en estructuras bien organizadas que aportan información y conocimiento para generar resultados en buscadores web. Al culminar la implementación del modelo de web semántica es posible decir que la universidad ha mejorado su presencia y visibilidad en la web a través del indexamiento de información en diferentes motores de búsqueda y posicionamiento en la categorización de universidades y de repositorios de Webometrics (ranking que proporciona clasificación de universidades de todo el mundo). AbstractAfter examining the current microsites and repositories situation in University, Peninsula of Santa Elena´s, it was found that information lacked optimal and appropriate semantic. Under these circumstances, there is a need to create a semantic web structure model for Universities, which was subsequently applied to UPSE´s microsites and digital repositories, as a test study case. Part of this project includes the installation of software modules with their respective configurations and the use of metadata standards such as DUBLIN CORE, to improve the SEO (Search Engine Optimization); with these applications, it was achieved the creation of standardized metadata and the creation of uploading information policies. The use of metadata transforms raw data into well-organized structures that provide information and knowledge to generate web engine search results. Upon completion of the implementation of semantic web model, it is possible to say that the university had improved its presence and visibility on the web through the indexing of information in different search engines and the categorization positioning of universities and Repositories in the Webometrics ranking (ranking Web of Higher Education Institutions Worldwide).


Author(s):  
A. Iwaniak ◽  
I. Kaczmarek ◽  
J. Łukowicz ◽  
M. Strzelecki ◽  
S. Coetzee ◽  
...  

Spatial planning documents contain information about the principles and rights of land use in different zones of a local authority. They are the basis for administrative decision making in support of sustainable development. In Poland these documents are published on the Web according to a prescribed non-extendable XML schema, designed for optimum presentation to humans in HTML web pages. There is no document standard, and limited functionality exists for adding references to external resources. The text in these documents is discoverable and searchable by general-purpose web search engines, but the semantics of the content cannot be discovered or queried. The spatial information in these documents is geographically referenced but not machine-readable. Major manual efforts are required to integrate such heterogeneous spatial planning documents from various local authorities for analysis, scenario planning and decision support. This article presents results of an implementation using machine-readable semantic metadata to identify relationships among regulations in the text, spatial objects in the drawings and links to external resources. A spatial planning ontology was used to annotate different sections of spatial planning documents with semantic metadata in the Resource Description Framework in Attributes (RDFa). The semantic interpretation of the content, links between document elements and links to external resources were embedded in XHTML pages. An example and use case from the spatial planning domain in Poland is presented to evaluate its efficiency and applicability. The solution enables the automated integration of spatial planning documents from multiple local authorities to assist decision makers with understanding and interpreting spatial planning information. The approach is equally applicable to legal documents from other countries and domains, such as cultural heritage and environmental management.


2010 ◽  
Vol 29 (3) ◽  
pp. 104 ◽  
Author(s):  
Jung-ran Park ◽  
Yuji Tosaka

This study explores the current state of metadata-creation practices across digital repositories and collections by using data collected from a nationwide survey of mostly cataloging and metadata professionals. Results show that MARC, AACR2, and LCSH are the most widely used metadata schema, content standard, and subjectcontrolled vocabulary, respectively. Dublin Core (DC) is the second most widely used metadata schema, followed by EAD, MODS, VRA, and TEI. Qualified DC’s wider use vis-à-vis Unqualified DC (40.6 percent versus 25.4 percent) is noteworthy. The leading criteria in selecting metadata and controlled-vocabulary schemata are collection-specific considerations, such as the types of resources, nature of the collection, and needs of primary users and communities. Existing technological infrastructure and staff expertise also are significant factors contributing to the current use of metadata schemata and controlled vocabularies for subject access across distributed digital repositories and collections. Metadata interoperability remains a major challenge. There is a lack of exposure of locally created metadata and metadata guidelines beyond the local environments. Homegrown locally added metadata elements may also hinder metadata interoperability across digital repositories and collections when there is a lack of sharable mechanisms for locally defined extensions and variants.


GigaScience ◽  
2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Jon Ison ◽  
Hans Ienasescu ◽  
Emil Rydza ◽  
Piotr Chmura ◽  
Kristoffer Rapacki ◽  
...  

Abstract Background Life scientists routinely face massive and heterogeneous data analysis tasks and must find and access the most suitable databases or software in a jungle of web-accessible resources. The diversity of information used to describe life-scientific digital resources presents an obstacle to their utilization. Although several standardization efforts are emerging, no information schema has been sufficiently detailed to enable uniform semantic and syntactic description—and cataloguing—of bioinformatics resources. Findings Here we describe biotoolsSchema, a formalized information model that balances the needs of conciseness for rapid adoption against the provision of rich technical information and scientific context. biotoolsSchema results from a series of community-driven workshops and is deployed in the bio.tools registry, providing the scientific community with >17,000 machine-readable and human-understandable descriptions of software and other digital life-science resources. We compare our approach to related initiatives and provide alignments to foster interoperability and reusability. Conclusions biotoolsSchema supports the formalized, rigorous, and consistent specification of the syntax and semantics of bioinformatics resources, and enables cataloguing efforts such as bio.tools that help scientists to find, comprehend, and compare resources. The use of biotoolsSchema in bio.tools promotes the FAIRness of research software, a key element of open and reproducible developments for data-intensive sciences.


Semantic Web ◽  
2020 ◽  
pp. 1-29
Author(s):  
Bettina Klimek ◽  
Markus Ackermann ◽  
Martin Brümmer ◽  
Sebastian Hellmann

In the last years a rapid emergence of lexical resources has evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is mostly absent or only contained in semi-structured strings. An integration of morphemic data has not yet been undertaken due to the lack of existing domain-specific ontologies and explicit morphemic data. In this paper, we present the Multilingual Morpheme Ontology called MMoOn Core which can be regarded as the first comprehensive ontology for the linguistic domain of morphological language data. It will be described how crucial concepts like morphs, morphemes, word forms and meanings are represented and interrelated and how language-specific morpheme inventories can be created as a new possibility of morphological datasets. The aim of the MMoOn Core ontology is to serve as a shared semantic model for linguists and NLP researchers alike to enable the creation, conversion, exchange, reuse and enrichment of morphological language data across different data-dependent language sciences. Therefore, various use cases are illustrated to draw attention to the cross-disciplinary potential which can be realized with the MMoOn Core ontology in the context of the existing Linguistic Linked Data research landscape.


Author(s):  
Andrew Iliadis ◽  
Wesley Stevens ◽  
Jean-Christophe Plantin ◽  
Amelia Acker ◽  
Huw Davies ◽  
...  

This panel focuses on the way that platforms have become key players in the representation of knowledge. Recently, there have been calls to combine infrastructure and platform-based frameworks to understand the nature of information exchange on the web through digital tools for knowledge sharing. The present panel builds and extends work on platform and infrastructure studies in what has been referred to as “knowledge as programmable object” (Plantin, et al., 2018), specifically focusing on how metadata and semantic information are shaped and exchanged in specific web contexts. As Bucher (2012; 2013) and Helmond (2015) show, data portability in the context of web platforms requires a certain level of semantic annotation. Semantic interoperability is the defining feature of so-called "Web 3.0"—traditionally referred to as the semantic web (Antoniou et al, 2012; Szeredi et al, 2014). Since its inception, the semantic web has privileged the status of metadata for providing the fine-grained levels of contextual expressivity needed for machine-readable web data, and can be found in products as diverse as Google's Knowledge Graph, online research repositories like Figshare, and other sources that engage in platformizing knowledge. The first paper in this panel examines the international Schema.org collaboration. The second paper investigates the epistemological implications when platforms organize data sharing. The third paper argues for the use of patents to inform research methodologies for understanding knowledge graphs. The fourth paper discusses private platforms’ extraction and collection of user metadata and the enclosure of data access.


Author(s):  
Sunil Tyagi

This chapter defines metadata, their types, creation, and some of the important functions. It enumerates an overview of the basic elements of the Dublin Core Metadata standard, and other metadata standards are also mentioned. The problem has been studied based on the information available in the open literature. As electronic information resources are rising and digital library initiatives are gaining wide acceptance, knowledge of metadata formats will help our library professionals in adapting their skills in cataloguing, classification, subject heading, key wording, and indexing for better inventory and exhaustive usage of electronic information. Metadata serves three general purposes. It supports resource discovery and locates the actual digital resource by inclusion of a digital identifier. As the number of electronic resources grows, metadata is used to create aggregate sites, bringing similar resources together and distinguishing dissimilar resources. The World Wide Web has created a revolution in the accessibility of digital information resources. Metadata is key to ensuring that resources will survive and continue to be accessible into the future. It can be embedded in a digital object or it can be stored separately like library catalogues. The Dublin Core (DC) is the most popular and widely accepted standard proposed to describe almost all categories of networked electronic resources.


2018 ◽  
pp. 2063-2085
Author(s):  
Erla M. Morales Morgado ◽  
Rosalynn A. Campos Ortuño ◽  
Ling Ling Yang ◽  
Tránsito Ferreras-Fernández

In this chapter the authors describe a Project entitled “Divulgación de Recursos Educativos Digitales (DIRED)” (Divulgation of Digital Educational Resources) addressed to promoting specific educational resources and mobile apps for educational proposals in order to manage them through the institutional repository of the Salamanca University (GREDOS). The authors present a proposal for describing learning objects based on pedagogical information, digital competences and learning styles. The authors also suggest educational information for classifying useful mobile apps. To achieve their suitable access and recovery, the authors focus on the use of Learning Object specific metadata in digital repositories such as LOM (Learning Object Metadata). The authors study the metadata mapping necessary to adapt from LOM to Qualified Dublin Core, because this is the standard used in the GREDOS repository built with a DSpace platform. Finally, the authors present their implementation of Learning Object Description in the GREDOS repository.


Author(s):  
Andreas D. Alexopoulos ◽  
Georgia D. Solomou ◽  
Dimitrios A. Koutsomitropoulos ◽  
Theodore Papatheodorou

In this chapter the authors present the basic characteristics about some existing educational metadata schemata and application profiles. They focus on the widely adopted IEEE LOM standard and give a brief analysis of its structure. Having in mind the utilization of educational metadata schemata by digital repositories preserving educational and research resources, they concentrate on a considerably popular system for this reason, DSpace. The authors want to show how the IEEE LOM metadata set can be incorporated in the default DSpace’s qualified Dublin Core metadata schema, introducing enhancements to the existing University of Patras live installation. For this reason, they document a potential LOM to Dublin Core metadata mapping and reveal possible gains from such an attempt. Further, they propose an ontological model for the repository’s metadata that takes also into account the educational characteristics of resources. In this way, they show how a semantic level of interoperability between educational applications can be achieved.


Sign in / Sign up

Export Citation Format

Share Document