Creating a Metadata-Enabled Framework for Resource Discovery in Knowledge Bases

Author(s):  
Lynne C. Howarth

With the proliferation of digitized resources accessible via Internet and Intranet knowledge bases, and a pressing need to develop more sophisticated tools for the identification and retrieval of electronic resources, both general purpose and domain-specific metadata schemes have assumed a particular prominence. While recent work emanating from the World Wide Web Consortium (W3C) has focused on the Resource Description Framework (RDF), and metadata maps or Acrosswalks” have been created to support the interoperability of metadata standards -- thus converting metatags from diverse domains from simply “machine-readable” to “machine-understandable”-- the next iteration, to “human-understandable,” remains a challenge. This apparent gap provides a framework for three-phase research (Howarth, 2000, 1999) to develop a tool which will provide a “human-understandable” front-end search assist to any XML-compliant metadata scheme. Findings from phase one, the analyses and mapping of seven metadata schemes, identify the particular challenges of designing a common “namespace”, populated with element tags which are appropriately descriptive, yet readily understood by a lay searcher, when there is little congruence within, and a high degree of variability across, the metadata schemes under study. Implications for the subsequent design and testing of both the proposed “metalevel ontology” (phase two), and the prototype search assist tool (phase three) are examined.

2017 ◽  
Vol 35 (1) ◽  
pp. 159-178
Author(s):  
Timothy W. Cole ◽  
Myung-Ja K. Han ◽  
Maria Janina Sarol ◽  
Monika Biel ◽  
David Maus

Purpose Early Modern emblem books are primary sources for scholars studying the European Renaissance. Linked Open Data (LOD) is an approach for organizing and modeling information in a data-centric manner compatible with the emerging Semantic Web. The purpose of this paper is to examine ways in which LOD methods can be applied to facilitate emblem resource discovery, better reveal the structure and connectedness of digitized emblem resources, and enhance scholar interactions with digitized emblem resources. Design/methodology/approach This research encompasses an analysis of the existing XML-based Spine (emblem-specific) metadata schema; the design of a new, domain-specific, Resource Description Framework compatible ontology; the mapping and transformation of metadata from Spine to both the new ontology and (separately) to the pre-existing Schema.org ontology; and the (experimental) modification of the Emblematica Online portal as a proof of concept to illustrate enhancements supported by LOD. Findings LOD is viable as an approach for facilitating discovery and enhancing the value to scholars of digitized emblem books; however, metadata must first be enriched with additional uniform resource identifiers and the workflow upgrades required to normalize and transform existing emblem metadata are substantial and still to be fully worked out. Practical implications The research described demonstrates the feasibility of transforming existing, special collections metadata to LOD. Although considerable work and further study will be required, preliminary findings suggest potential benefits of LOD for both users and libraries. Originality/value This research is unique in the context of emblem studies and adds to the emerging body of work examining the application of LOD best practices to library special collections.


2017 ◽  
Vol 73 (5) ◽  
pp. 803-824 ◽  
Author(s):  
Goran Sladić ◽  
Igor Cverdelj-Fogaraši ◽  
Stevan Gostojić ◽  
Goran Savić ◽  
Milan Segedinac ◽  
...  

Purpose The purpose of this paper is to identify the benefits of an approach in which document management systems (DMSs) are based on a formal and explicit document model, primarily in terms of facilitating domain-specific customization. Design/methodology/approach Within this paper, a generic document model is proposed. The model consists of two layers. A general purpose layer, which represents common features of the documents, and a domain-specific layer, modeling properties particular to application domain. The general purpose layer is based on ISO 82045, providing high degree of interoperability with other systems developed with respect to this set of standard. Findings Splitting document model into the layers enables DMSs to be tailored for each particular domain of application, depending on the general purpose layer. The existence of domain-specific layer allows documents to be interpreted differently in different domains of application. Practical implications In order to enable customization of DMS for a particular domain, the implementation of domain-specific document layer is required. Also, the proposed model does not explicitly deal with document dynamics. Originality/value The proposed document ontology is general enough to provide the representation of documents not depending on a specific scope of application, yet flexible enough to enable extensions through which domain-specific document features can be expressed. The separation of document model enables development of core DMS offering services relying explicitly on the general purpose layer on one hand, as well as domain-specific customization of DMS on the other.


2001 ◽  
Vol 10 (01n02) ◽  
pp. 65-86 ◽  
Author(s):  
DAN I. MOLDOVAN ◽  
ROXANA C. GÎRJU

It is widely accepted that more knowledge means more intelligence. In many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases. This paper presents a methodology for discovering domain-specific concepts and relationships in an attempt to extend WordNet. The method was tested on five seed concepts selected from the financial domain: interest rate, stock market, inflation, economic growth, and employment. Queries were formed with each of these concepts and a corpus of 5000 sentences was extracted automatically from the Internet and TREC-8 corpora. On this corpus, the system discovered a total of 264 new concepts not defined in WordNet, of which 221 contain the seeds and 43 are other related concepts. The system also discovered 64 relationships that link these concepts with either WordNet concepts or with each other. The relationships were extracted with the help of 22 distinct lexico-syntactic patterns representing four semantic relations. It takes the system approximately 40 minutes per seed working in interactive mode to discover the new concepts and relationships on the 5000 sentence corpus.


Semantic Web ◽  
2020 ◽  
pp. 1-29
Author(s):  
Bettina Klimek ◽  
Markus Ackermann ◽  
Martin Brümmer ◽  
Sebastian Hellmann

In the last years a rapid emergence of lexical resources has evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is mostly absent or only contained in semi-structured strings. An integration of morphemic data has not yet been undertaken due to the lack of existing domain-specific ontologies and explicit morphemic data. In this paper, we present the Multilingual Morpheme Ontology called MMoOn Core which can be regarded as the first comprehensive ontology for the linguistic domain of morphological language data. It will be described how crucial concepts like morphs, morphemes, word forms and meanings are represented and interrelated and how language-specific morpheme inventories can be created as a new possibility of morphological datasets. The aim of the MMoOn Core ontology is to serve as a shared semantic model for linguists and NLP researchers alike to enable the creation, conversion, exchange, reuse and enrichment of morphological language data across different data-dependent language sciences. Therefore, various use cases are illustrated to draw attention to the cross-disciplinary potential which can be realized with the MMoOn Core ontology in the context of the existing Linguistic Linked Data research landscape.


2021 ◽  
Vol 16 ◽  
pp. 1-10
Author(s):  
Husni Teja Sukmana ◽  
JM Muslimin ◽  
Asep Fajar Firmansyah ◽  
Lee Kyung Oh

In Indonesia, philanthropy is identical to Zakat. Zakat belongs to a specific domain because it has its characteristics of knowledge. This research studied knowledge graph in the Zakat domain called KGZ which is conducted in Indonesia. This area is still rarely performed, thus it becomes the first knowledge graph for Zakat in Indonesia. It is designed to provide basic knowledge on Zakat and managing the Zakat in Indonesia. There are some issues with building KGZ, firstly, the existing Indonesian named entity recognition (NER) is non-restricted and general-purpose based which data is obtained from a general source like news. Second, there is no dataset for NER in the Zakat domain. We define four steps to build KGZ, involving data acquisition, extracting entities and their relationship, mapping to ontology, and deploying knowledge graphs and visualizations. This research contributed a knowledge graph for Zakat (KGZ) and a building NER model for Zakat, called KGZ-NER. We defined 17 new named entity classes related to Zakat with 272 entities, 169 relationships and provided labelled datasets for KGZ-NER that are publicly accessible. We applied the Indonesian-Open Domain Information Extractor framework to process identifying entities’ relationships. Then designed modeling of information using resources description framework (RDF) to build the knowledge base for KGZ and store it to GraphDB, a product from Ontotext. This NER model has a precision 0.7641, recall 0.4544, and F1-score 0.5655. The increasing data size of KGZ is required to discover all of the knowledge of Zakat and managing Zakat in Indonesia. Moreover, sufficient resources are required in future works.


Author(s):  
David Fichtmueller ◽  
Walter G. Berendsohn ◽  
Gabriele Droege ◽  
Falko Glöckler ◽  
Anton Güntsch ◽  
...  

The TDWG standard ABCD (Access to Biological Collections Data task group 2007) was aimed at harmonizing terminologies used for modelling biological collection information and is used as a comprehensive data format for transferring collection and observation data between software components. The project ABCD 3.0 (A community platform for the development and documentation of the ABCD standard for natural history collections) was financed by the German Research Council (DFG). It addressed the transformation of ABCD into a semantic web-compliant ontology by deconstructing the XML-schema into individually addressable RDF (Resource Description Framework) resources published via the TDWG Terms Wiki (https://terms.tdwg.org/wiki/ABCD_2). In a second step, informal properties and concept-relations described by the original ABCD-schema were transformed into a machine-readable ontology and revised (Güntsch et al. 2016). The project was successfully finished in January 2019. The ABCD 3 setup allows for the creation of standard-conforming application schemas. The XML variant of ABCD 3.0 was restructured, simplified and made more consistent in terms of element names and types as compared to version 2.x. The XML elements are connected to their semantic concepts using the W3C SAWSDL (Semantic Annotation for Web Services Description Language and XML Schema) standard. The creation of specialized applications schemas is encouraged, the first use case was the application schema for zoology. It will also be possible to generate application schemas that break the traditional unit-centric structure of ABCD. Further achievements of the project include creating a Wikibase instance as the editing platform, with related tools for maintenance queries, such as checking for inconsistencies in the ontology and automated export into RDF. This allows for fast iterations of new or updated versions, e.g. when additional mappings to other standards are done. The setup is agnostic to the data standard created, it can therefore also be used to create or model other standards. Mappings to other standards like Darwin Core (https://dwc.tdwg.org/) and Audubon Core (https://tdwg.github.io/ac/) are now machine readable as well. All XPaths (XML Paths) of ABCD 3.0 XML have been mapped to all variants of ABCD 2.06 and 2.1, which will ease transition to the new standard. The ABCD 3 Ontology will also be uploaded to the GFBio Terminology Server (Karam et al. 2016), where individual concepts can be easily searched or queried, allowing for better interactive modelling of ABCD concepts. ABCD documentation now adheres to TDWG’s Standards Documentation Standard (SDS, https://www.tdwg.org/standards/sds/) and is located at https://abcd.tdwg.org/. The new site is hosted on Github: https://github.com/tdwg/abcd/tree/gh-pages.


Sign in / Sign up

Export Citation Format

Share Document