scholarly journals Progress in Authority Management of People Names for Collections

Author(s):  
Quentin Groom ◽  
Chloé Besombes ◽  
Josh Brown ◽  
Simon Chagnoux ◽  
Teodor Georgiev ◽  
...  

The concept of building a network of relationships between entities, a knowledge graph, is one of the most effective methods to understand the relations between data. By organizing data, we facilitate the discovery of complex patterns not otherwise evident in the raw data. Each datum at the nodes of a knowledge graph needs a persistent identifier (PID) to reference it unambiguously. In the biodiversity knowledge graph, people are key elements (Page 2016). They collect and identify specimens, they publish, observe, work with each other and they name organisms. Yet biodiversity informatics has been slow to adopt PIDs for people and people are currently represented in collection management systems as text strings in various formats. These text strings often do not separate individuals within a collecting team and little biographical information is collected to disambiguate collectors. In March 2019 we organised an international workshop to find solutions to the problem of PIDs for people in collections with the aim of identifying people unambiguously across the world's natural history collections in all of their various roles. Stakeholders were represented from 11 countries, representing libraries, collections, publishers, developers and name registers. We want to identify people for many reasons. Cross-validation of information about a specimen with biographical information on the specimen can be used to clean data. Mapping specimens from individual collectors across multiple herbaria can geolocate specimens accurately. By linking literature to specimens through their authors and collectors we can create collaboration networks leading to a much better understanding of the scientific contribution of collectors and their institutions. For taxonomists, it will be easier to identify nomenclatural type and syntype material, essential for reliable typification. Overall, it will mean that geographically dispersed specimens can be treated much more like a single distributed infrastructure of specimens as is envisaged in the European Distributed Systems of Scientific Collections Infrastructure (DiSSCo). There are several person identifier systems in use. For example, the Virtual International Authority File (VIAF) is a widely used system for published authors. The International Standard Name Identifier (ISNI), has broader scope and incorporates VIAF. The ORCID identifier system provides self-registration of living researchers. Also, Wikidata has identifiers of people, which have the advantage of being easy to add to and correct. There are also national systems, such as the French and German authority files, and considerable sharing of identifiers, particularly on Wikidata. This creates an integrated network of identifiers that could act as a brokerage system. Attendees agreed that no one identifier system should be recommended, however, some are more appropriate for particular circumstances. Some difficulties have still to be resolved to use those identifier schemes for biodiversity : 1) duplicate entries in the same identifier system; 2) handling collector teams and preserving the order of collectors; 3) how we integrate identifiers with standards such as Darwin Core, ABCD and in the Global Biodiversity Information Facility; and 4) many living and dead collectors are only known from their specimens and so they may not pass notability standards required by many authority systems. The participants of the workshop are now working on a number of fronts to make progress on the adoption of PIDs for people in collections. This includes extending pilots that have already been trialed, working with identifier systems to make them more suitable for specimen collectors and talking to service providers to encourage them to use ORCID iDs to identify their users. It was concluded that resolving the problem of person identifiers for collections is largely not a lack of a solution, but a need to implement solutions that already exist.

2021 ◽  
Vol 9 ◽  
Author(s):  
Mariya Dimitrova ◽  
Viktor Senderov ◽  
Teodor Georgiev ◽  
Georgi Zhelezov ◽  
Lyubomir Penev

OpenBiodiv is a biodiversity knowledge graph containing a synthetic linked open dataset, OpenBiodiv-LOD, which combines knowledge extracted from academic literature with the taxonomic backbone used by the Global Biodiversity Information Facility. The linked open data is modelled according to the OpenBiodiv-O ontology integrating semantic resource types from recognised biodiversity and publishing ontologies with OpenBiodiv-O resource types, introduced to capture the semantics of resources not modelled before. We introduce the new release of the OpenBiodiv-LOD attained through information extraction and modelling of additional biodiversity entities. It was achieved by further developments to OpenBiodiv-O, the data storage infrastructure and the workflow and accompanying R software packages used for transformation of academic literature into Resource Description Framework (RDF). We discuss how to utilise the LOD in biodiversity informatics and give examples by providing solutions to several competency questions. We investigate performance issues that arise due to the large amount of inferred statements in the graph and conclude that OWL-full inference is impractical for the project and that unnecessary inference should be avoided.


2013 ◽  
Vol 680 ◽  
pp. 534-539
Author(s):  
Wei Feng Ma

With the rapid expansion of the campus scale and the increasing of the geographically dispersed campus, how to adopt new theory, new method and new technology to realize the equipment optimized assignment and the information management is a new research challenge. It is the key to safeguard the national fund to use reasonably, and to speed up the development of education healthily. Through analyzing the domestic and foreign related research works, the paper proposed that it can take use of the spatial data expression and analysis with Geographic Information System (GIS) to realize the large-scale and inter-campuses equipment optimized assignment and information management. It discussed the mathematics model and the system architecture. Moreover, the paper described the key implementation technology in great detail such as spatial data mapping with MapInfo professional 9 and the development of WebGIS functions with MapXtreme. The results show that the solution is feasible and effective.


Author(s):  
Lyubomir Penev ◽  
Teodor Georgiev ◽  
Viktor Senderov ◽  
Mariya Dimitrova ◽  
Pavel Stoev

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.


Author(s):  
Yunqing Li ◽  
Shivakumar Raman ◽  
Paul Cohen ◽  
Binil Starly

Abstract Knowledge graph networks powering web search and chatbot agents have shown immense popularity. This paper discusses the first steps towards building a knowledge graph for manufacturing services discoverability. Due to the lack of a unified widely adopted schema for structured data in the manufacturing services domain as well as the limitations of existing relational database schemas to represent manufacturing service definitions, there does not exist a unified schema that connects manufacturing resources service descriptions and actual manufacturing service business entities. This gap severely limits the automated discoverability of manufacturing service business organizations. This paper designs a knowledge graph covering over 8,000+ manufacturers, the manufacturing services they provide and corresponding linkage with manufacturing service definitions available from Wikidata. In addition, this work also proposes extensions to Schema.org to assist small business manufacturers to contain embedded search engine optimization (SEO) tags for search and discovery through web search engines. Such vocabulary extensions are critical to the rapid identification and real-time capability assessment particularly when the service providers themselves are responsible for updating tags. A wider scale enhancement of manufacturing specific vocabulary extensions to schema.org can tremendously benefit small and medium scale manufacturers. This paper concludes with the additional work that must be done for a comprehensive addition to manufacturing service graph that spans the entire manufacturing knowledge base.


1969 ◽  
Vol 17 (2) ◽  
Author(s):  
Yali Friedman

Last fall I was invited to an international workshop with the aim of helping develop a research university in Okinawa, the Okinawa Institute of Science and Technology (OIST). It was an enlightening experience to observe the creation of a new knowledge infrastructure. Although I cannot comment on the workshop discussions, I will share some of my personal thoughts and observations.To understand the development of a research university in Okinawa, it is necessary to first understand Okinawa. Historically a separate nation, Okinawa became a prefecture of Japan in 1879. Following the Second World War, Okinawa was under United States administration until 1972, when it was transferred to Japanese administration. It comprises less than 1% of Japan's landmass, but is home to more than 75% of Japan's US military bases. While under US administration, Okinawa's economy was largely comprised of direct and indirect revenues from the US military bases. Since the transfer to Japanese control, concerted efforts have been underway to diversify and develop an independent economy. The main industries are currently tourism, functional foods and information and communication industries. Okinawa's dependence on revenues from the US military bases has decreased, but unemployment remains high – twice the rate of any other prefecture – and per capita income is the lowest in Japan.My first observation on arriving at OIST was its isolation. The institute was built into a dense forest at the top of a mountain, in wonderful harmony with nature. Yet, as I looked out at the rich forests, I wondered where all the supportive infrastructure was. Where were the office parks, incubator spaces and the cafes and restaurants where innovators could work and interact? It became immediately apparent that beyond building a state-of-the-art research institute, much effort would be needed to attract and retain complementary assets. If scientists seeking to develop innovations from OIST laboratories had to leave the area, or leave Okinawa, to develop them, then they might never return, or worse, not elect to initiate research in Okinawa.Beyond simply having the necessary resources for development and commercialization of innovations, Okinawa and OIST also need a compelling pitch if they are to attract interest; given the numerous global locations to engage in research and development, what are compelling reasons to select Okinawa? The founders of OIST established it as an English-speaking institute – a decision which potentially places it as a gateway for Japanese seeking to reach outwards, and a gateway for foreigners seeking access to Japan markets and minds. They have also been strongly involved in supporting local schools, helping build an innovative mindset among the next generation of Okinawans.I feel that more aggressive tactics should also be applied. Okinawa's unique situation – the relative abundance of foreign military bases and the weak economy – enable it to make special requests of the central government. I strongly encourage OIST and Okinawa to seek special status to bolster development. Just as Puerto Rico's strategic tax abatements led it to become the dominant location for pharmaceutical manufacturing for the US market, Okinawa can employ policy measures unavailable to other prefectures to drive development. Reducing the tax burden for eligible start-ups and reducing payroll taxes for start-up employees are good ideas which have been implemented elsewhere, but Okinawa can also become a test-ground for greater innovation policies. Article 35 of Japan's Patent Law, similar to the US Bayh-Dole Act, grants ownership of employee inventions to the employer (including research institutes and universities). Although this automatic grant of ownership to universities has been successful in the leading American universities, an alternative model has been working very well in other countries. Some universities, such as Canada's University of Waterloo (home to more high-tech and knowledge-based spin-offs than any other Canadian school) opt to grant intellectual property ownership to the inventor. Although the university might lose millions of dollars in potential patent royalties, it is able to attract and retain leading researchers at lower cost and also gains all the spillover benefits from development and commercialization. By granting OIST a waiver from Article 35, the institute could attract global research leaders who seek to own their inventions. Venture capitalists and service providers could follow these researchers, helping develop a local supportive infrastructure at no direct cost.The development of a new research university is a complex undertaking. Diverse inter-connected and mutually dependent elements must be laid down, often with external support to sustain them until they can be self-sufficient. The leadership at OIST realizes the need for long-term thinking and sustained support. I look forward to following their progress.


Sign in / Sign up

Export Citation Format

Share Document