scholarly journals Enabling Digital Specimen and Extended Specimen Concepts in Current Tools and Services

Author(s):  
Falko Glöckler

Digital specimens (Hardisty 2018, Hardisty 2020) are the cyberspace equivalent to objects in a physical, often museum-based collection. They consist of references to data and metadata related to the collection object. Through the ongoing process of digitizing legacy data, gaining knowledge from new field collections or research, and annotating and linking to related resources, a digital specimen can evolve independently from the original physical object. Especially the provenance records cannot always be assigned to the physical object when the knowledge was gained solely from the digital representation. A physical specimen can also be understood as a physical preparation (or a set of multiple preparations, e.g. DNA samples taken from a preserved organism) accompanied by related digital and non-digital data sources (e.g. images, descriptions in fieldbooks, research data) rather than just a single object. This concept of an extended specimen has been described by Webster (2017) and is used in the initiative The Extended Specimen Network (Lendemer et al. 2019) to enhance the access and research potential of specimens. Digital specimens need to reflect both, eventual complexity of the physical object (extended specimen) and the knowledge gained from and linked to the digital object itself. In order to provide, track and make use of the digital specimens, the community of collection-holding institutions might need to think of digital specimens as standalone virtual collections that emanate from physical collections. Additionally, new versions of a digital specimen continuously derive from changes of the physical specimen as the (meta)data are being updated in collection management systems to document the state and treatment of the physical objects. Consequently, there is a challenge to enable the management of both: linked digital specimens in the World Wide Web and the local data of physical specimens in databases of collection-holding institutions and other tools and services. In this panel discussion, central questions about the requirements, obstacles and opportunities of implementing the concepts of digital specimens and extended specimens in software tools like collection management systems are discussed. The aim is to identify the major tasks and priorities regarding the transformation of tools and services from multiple perspectives: local collection data management, international data infrastructures like the Distributed System of Scientific Collections (DiSSCo) and the Global Biodiversity Information Facility (GBIF), and data usage outside of domain-specific subject areas.

2018 ◽  
Vol 2 ◽  
pp. e25635
Author(s):  
Mikko Heikkinen ◽  
Falko Glöckler ◽  
Markus Englund

The DINA Symposium (“DIgital information system for NAtural history data”, https://dina-project.net) ends with a plenary session involving the audience to discuss the interplay of collection management and software tools. The discussion will touch different areas and issues such as: (1) Collection management using modern technology: How should and could collections be managed using current technology – What is the ultimate objective of using a new collection management system? How should traditional management processes be changed? (2) Development and community Why are there so many collection management systems? Why is it so difficult to create one system that fits everyone’s requirements? How could the community of developers and collection staff be built around DINA project in the future? (3) Features and tools How to identify needs that are common to all collections? What are the new tools and technologies that could facilitate collection management? How could those tools be implemented as DINA compliant services? (4) Data What data must be captured about collections and specimens? What criteria need to be applied in order to distinguish essential and “nice-to-have” information? How should established data standards (e.g. Darwin Core & ABCD (Access to Biological Collection Data)) be used to share data from rich and diverse data models? In addition to the plenary discussion around these questions, we will agree on a streamlined format for continuing the discussion in order to write a white paper on these questions. The results and outcome of the session will constitute the basis of the paper and will be subsequently refined.


Author(s):  
David Shorthouse

Bionomia, https://bionomia.net previously called Bloodhound Tracker, was launched in August 2018 with the aim of illustrating the breadth and depth of expertise required to collect and identify natural history specimens represented in the Global Biodiversity Information Facility (GBIF). This required that specimens and people be uniquely identified and that a granular expression of actions (e.g. "collected", "identified") be adopted. The Darwin Core standard presently combines agents and their actions into the conflated terms recordedBy and identifiedBy whose values are typically unresolved and unlinked text strings. Bionomia consists of tools, web services, and a responsive website, which are all used to efficiently guide users to resolve and unequivocally link people to specimens via first-class actions collected or identified. It also shields users from the complexity of casting together and seamlessly integrating the services of four giant initiatives: ORCID, Wikidata, GBIF, and Zenodo. All of these initiatives are financially sustainable and well-used by many stakeholders, well-outside this narrow user-case. As a result, the links between person and specimen made by users of Bionomia are given every opportunity to persist, to represent credit for effort, and to flow into collection management systems as meaningful new entries. To date, 13M links between people and specimens have been made including 2M negative associations on 12.5M specimen records. These links were either made by the collectors themselves or by 84 people who have attributed specimen records to their peers, mentors and others they revere. Integration With ORCID and Wikidata People are identified in Bionomia through synchronization with ORCID and Wikidata by reusing their unique identifiers and drawing in their metadata. ORCID identifiers are used by living researchers to link their identites to their research outputs. ORCID services include OAuth2 pass-through authentication for use by developers and web services for programmatic access to its store of public profiles. These contain elements of metadata such as full name, aliases, keywords, countries, education, employment history, affiliations, and links to publications. Bionomia seeds its search directory of people by periodically querying ORCID for specific user-assigned keywords as well as directly though account creation via OAuth2 authentication. Deceased people are uniquely identified in Bionomia through integration with Wikidata by caching unique 'Q' numbers (identifiers), full names and aliases, countries, occupations, as well as birth and death dates. Profiles are seeded from Wikidata through daily queries for properties that are likely to be assigned to collectors of natural history specimens such as "Entomologists of the World ID" (= P5370) or "Harvard Index of Botanists ID" (= P6264). Because Wikidata items may be merged, Bionomia captures these merge events, re-associates previously made links to specimen records, and mirrors Wikidata's redirect behaviour. A Wikidata property called "Bionomia ID" (= P6944), whose values are either ORCID identifiers or Wikidata 'Q' numbers, helps facilitate additional integration and reuse. Integration with GBIF Specimen data are downloaded wholesale as Darwin Core Archives from GBIF every two weeks. The purpose of this schedule is to maintain a reasonable synchrony with source data that balances computation time with the expections of users who desire the most up-to-date view of their specimen records. Collectors with ORCID accounts who have elected to receive notice, are informed via email message when the authors of newly published papers have made use of their specimen records downloaded from GBIF. Integration with Zenodo Finally, users of Bionomia may integrate their ORCID OAuth2 authentication with Zenodo, an industry-recognized archive for research data, which enjoys support from the Conseil Européen pour la Recherche Nucléaire (CERN). At the user's request, their specimen data represented as CSV (comma-separated values) and JSON-LD (JavaScript Object Notation for Linked Data) documents are pushed into Zenodo, a DataCite DOI is assigned, and a formatted citation appears on their Bionomia profile. New versions of these files are pushed to Zenodo on the user's behalf when new specimen records are linked to them. If users have configured their ORCID account to listen for new entries in DataCite, a new work entry will also be made in their ORCID profile, thus sealing a perpetual, semi-automated loop betwen GBIF and ORCID that tidily showcases their efforts at collecting and identifying natural history specimens. Technologies Used Bionomia uses Apache Spark via scripts written in Scala, a human name parser written in Ruby called dwc_agent, queues of jobs executed through Sidekiq, scores of pairwise similarities in the structure of human names stored in Neo4j, data persistence in MySQL, and a search layer in Elasticsearch. Here, I expand on lessons learned in the construction and maintenance of Bionomia, emphasize the criticality of recognizing the early efforts made by a fledgling community of enthusiasts, and describe useful tools and services that may be integrated into collection management systems to help churn strings of unresolved, unlinked collector and determiner names into actionable identifiers that are gateways to rich sources of information.


Author(s):  
Leonor Venceslau ◽  
Luis Lopes

Major efforts are being made to digitize natural history collections to make these data available online for retrieval and analysis (Beaman and Cellinese 2012). Georeferencing, an important part of the digitization process, consists of obtaining geographic coordinates from a locality description. In many natural history collection specimens, the coordinates of the sampling location are not recorded, rather they contain a description of the site. Inaccurate georeferencing of sampling locations negatively impacts data quality and the accuracy of any geographic analysis on those data. In addition to latitude and longitude, it is important to define a degree of uncertainty of the coordinates, since in most cases it is impossible to pinpoint the exact location retrospectively. This is usually done by defining an uncertainty value represented as a radius around the center of the locality where the sampling took place. Georeferencing is a time-consuming process requiring manual validation; as such, a significant part of all natural history collection data available online are not georeferenced. Of the 161 million records of preserved specimens currently available in the Global Biodiversity Information Facility (GBIF), only 86 million (53.4%) include coordinates. It is therefore important to develop and optimize automatic tools that allow a fast and accurate georeferencing. The objective of this work was to test existing automatic georeferencing services and evaluate their potential to accelerate georeferencing of large collection datasets. For this end, several open-source georeferencing services are currently available, which provide an application programming interface (API) for batch georeferencing. We evaluated five programs: Google Maps, MapQuest, GeoNames, OpenStreetMap, and GEOLocate. A test dataset of 100 records (reference dataset), which had been previously individually georreferenced following Chapman and Wieczorek 2006, was randomly selected from the Museu Nacional de História Natural e da Ciência, Universidade de Lisboa insect collection catalogue (Lopes et al. 2016). An R (R Core Team 2018) script was used to georeference these records using the five services. In cases where multiple results were returned, only the first one was considered and compared with the manually obtained coordinates of the reference dataset. Two factors were considered in evaluating accuracy: Total number of results obtained and Distance to the original location in the reference dataset. Total number of results obtained and Distance to the original location in the reference dataset. Of the five programs tested, Google Maps yielded the most results (99) and was the most accurate with 57 results < 1000 m from the reference location and 79 within the uncertainty radius. GEOLocate provided results for 87 locations, of which 47 were within 1000 m of the correct location, and 57 were within the uncertainty radius. The other 3 services tested all had less than 35 results within 1000 m from the reference location, and less than 50 results within the uncertainty radius. Google Maps and Open Street Map had the lowest average distance from the reference location, both around 5500 m. Google Maps has a usage limit of around 40000 free georeferencing requests per month, beyond which the service is paid, while GEOLocate is free with no usage limit. For large collections, this may be a factor to take into account. In the future, we hope to optimize these methods and test them with larger datasets.


2021 ◽  
Vol 35 (1) ◽  
pp. 1-20
Author(s):  
Breda M. Zimkus ◽  
Linda S. Ford ◽  
Paul J. Morris

Abstract A growing number of domestic and international legal issues are confronting biodiversity collections, which require immediate access to information documenting the legal aspects of specimen ownership and restrictions regarding use. The Nagoya Protocol, which entered into force in 2014, established a legal framework for access and benefit-sharing of genetic resources and has notable implications for collecting, researchers working with specimens, and biodiversity collections. Herein, we discuss how this international protocol mandates operating changes within US biodiversity collections. Given the new legal landscape, it is clear that digital solutions for tracking records at all stages of a specimen's life cycle are needed. We outline how the Harvard Museum of Comparative Zoology (MCZ) has made changes to its procedures and museum-wide database, MCZbase (an independent instance of the Arctos collections management system), linking legal compliance documentation to specimens and transactions (i.e., accessions, loans). We used permits, certificates, and agreements associated with MCZ specimens accessioned in 2018 as a means to assess a new module created to track compliance documentation, a controlled vocabulary categorizing these documents, and the automatic linkages established among documentation, specimens, and transactions. While the emphasis of this work was a single year test case, its successful implementation may be informative to policies and collection management systems at other institutions.


2009 ◽  
pp. 2708-2734
Author(s):  
Christine Julien ◽  
Sanem Kabadayi

Emerging pervasive computing scenarios involve client applications that dynamically collect information directly from the local environment. The sophisticated distribution and dynamics involved in these applications place an increased burden on developers that create applications for these environments. The heightened desire for rapid deployment of a wide variety of pervasive computing applications demands a new approach to application development in which domain experts with minimal programming expertise are empowered to rapidly construct and deploy domain-specific applications. This chapter introduces the DAIS (Declarative Applications in Immersive Sensor networks) middleware that abstracts a heterogeneous and dynamic pervasive computing environment into intuitive and accessible programming constructs. At the programming interface level, this requires exposing some aspects of the physical world to the developer, and DAIS accomplishes this through a suite of novel programming abstractions that enable on-demand access to dynamic local data sources. A fundamental component of the model is a hierarchical view of pervasive computing middleware that allows devices with differing capabilities to support differing amounts of functionality. This chapter reports on our design of the DAIS middleware and highlights the abstractions, the programming interface, and the reification of the middleware on a heterogeneous combination of client devices and resource-constrained sensors.


2005 ◽  
Vol 116 (Supplement) ◽  
pp. 52
Author(s):  
James H. Wells ◽  
Tracy Hotta ◽  
Michael F. McGuire ◽  
Barbara B. Weber ◽  
Paul R. Weiss

Author(s):  
Thomas Hedberg ◽  
Allison Barnard Feeney ◽  
Moneer Helu ◽  
Jaime A. Camelio

Industry has been chasing the dream of integrating and linking data across the product lifecycle and enterprises for decades. However, industry has been challenged by the fact that the context in which data are used varies based on the function/role in the product lifecycle that is interacting with the data. Holistically, the data across the product lifecycle must be considered an unstructured data set because multiple data repositories and domain-specific schema exist in each phase of the lifecycle. This paper explores a concept called the lifecycle information framework and technology (LIFT). LIFT is a conceptual framework for lifecycle information management and the integration of emerging and existing technologies, which together form the basis of a research agenda for dynamic information modeling in support of digital-data curation and reuse in manufacturing. This paper provides a discussion of the existing technologies and activities that the LIFT concept leverages. Also, the paper describes the motivation for applying such work to the domain of manufacturing. Then, the LIFT concept is discussed in detail, while underlying technologies are further examined and a use case is detailed. Lastly, potential impacts are explored.


Author(s):  
Agnieszka Łętowska

Taxonomies are attempts of ordering world in semantic field. They are also a useful tool of knowledge organization, facilitating domain information retrieval. Taxonomy or thesauri creation requires extensive and thorough analytical process, engaging information and knowledge from variety of sources. Typically it is a longterm activity. Such traditional approach failed for knowledge organization gathered on Leopoldina.pl platform. The aim of the service is to present heterogenic and differentiated digital resources of University of Wrocław (UWr). Due to lack of success of creation an universal taxonomy in given period of time, an bottom-up approach was proposed. The entire range of vocabulary indexed in Leopoldina.pl platform was divided into discipline categories which were used as the base for taxonomies and thesauri creation. In current paper we describe the bottom-up process of taxonomy creation. Programming tools (in Python) used for domain dictionaries creation are presented. We provided also evaluation of external organized knowledge sources (like Wikipedia, GBIF ̶ the Global Biodiversity Information Facility and other domain-specific thematic portals) for automatic knowledge handling and taxonomies creation.


Sign in / Sign up

Export Citation Format

Share Document