scholarly journals Plenary Discussion - Future of Collection Management Systems

2018 ◽  
Vol 2 ◽  
pp. e25635
Author(s):  
Mikko Heikkinen ◽  
Falko Glöckler ◽  
Markus Englund

The DINA Symposium (“DIgital information system for NAtural history data”, https://dina-project.net) ends with a plenary session involving the audience to discuss the interplay of collection management and software tools. The discussion will touch different areas and issues such as: (1) Collection management using modern technology: How should and could collections be managed using current technology – What is the ultimate objective of using a new collection management system? How should traditional management processes be changed? (2) Development and community Why are there so many collection management systems? Why is it so difficult to create one system that fits everyone’s requirements? How could the community of developers and collection staff be built around DINA project in the future? (3) Features and tools How to identify needs that are common to all collections? What are the new tools and technologies that could facilitate collection management? How could those tools be implemented as DINA compliant services? (4) Data What data must be captured about collections and specimens? What criteria need to be applied in order to distinguish essential and “nice-to-have” information? How should established data standards (e.g. Darwin Core & ABCD (Access to Biological Collection Data)) be used to share data from rich and diverse data models? In addition to the plenary discussion around these questions, we will agree on a streamlined format for continuing the discussion in order to write a white paper on these questions. The results and outcome of the session will constitute the basis of the paper and will be subsequently refined.

Author(s):  
Vladimir Blagoderov

Most digitisation workflows are focused on legacy material, due to the sheer number of objects already collected. However, it is just as important to develop protocols for digitisation of incoming material to reduce accumulation of an additional backlog. This is especially crucial with the advent of molecular collections and field sequencing. In-the-field extraction and sequencing (Oxford Nanopore Technologies 2018) may lead to increasing numbers of voucher specimens without proper collection data and labels; or specimens disassociated with data. It is easy for researchers occupied by collecting and sequencing to delay proper documentation until a later date. As a curator, I can vouch that specimens without properly recorded data (with only collecting codes, for example) are lost for science. Fortunately, a combination of the best collecting and curatorial practices, simple online and offline tools, and modern technologies, makes in-the-field digitisation a reality. In the last couple of years, entomologists at the National Museums Scotland (NMS) have been testing the following workflow: Collecting routes and points are recorded with ViewRanger (Augmentra Ltd 2019), available as an app for mobile phones; At the moment of collecting, event data is recorded with Epicollect5 (Imperial College London 2019), available as Android app. Software's field generator allows creation of different scenarios, depending on method or circumstances of collection; and records main types of data: text, dates, time, coordinates. Individual collecting code is associated with the record; Specimens collected are prepared (pinned, stored in preservative, dried, etc.) and associated with corresponding collecting code; Additional data (diary records) is recorded in a notebook with Neo Smartpen (NEO SMARTPEN Inc. 2017) and digitsed. Collecting event records are imported into a collection management system (CMS) (PAPIS, Pape and Ioannou 2019) or EarthCape (EarthCape 2019); Specimen lots (if relevant) are sorted to a desirable level; Multiple specimen or lot records are created in CMS based on collecting event records; Data labels and UID labels are printed and physically associated with specimens or lots; Additional data (klm file of collecting route, diary records) are imported and associated with collecting events. Collecting routes and points are recorded with ViewRanger (Augmentra Ltd 2019), available as an app for mobile phones; At the moment of collecting, event data is recorded with Epicollect5 (Imperial College London 2019), available as Android app. Software's field generator allows creation of different scenarios, depending on method or circumstances of collection; and records main types of data: text, dates, time, coordinates. Individual collecting code is associated with the record; Specimens collected are prepared (pinned, stored in preservative, dried, etc.) and associated with corresponding collecting code; Additional data (diary records) is recorded in a notebook with Neo Smartpen (NEO SMARTPEN Inc. 2017) and digitsed. Collecting event records are imported into a collection management system (CMS) (PAPIS, Pape and Ioannou 2019) or EarthCape (EarthCape 2019); Specimen lots (if relevant) are sorted to a desirable level; Multiple specimen or lot records are created in CMS based on collecting event records; Data labels and UID labels are printed and physically associated with specimens or lots; Additional data (klm file of collecting route, diary records) are imported and associated with collecting events. Steps 1-4, and, depending on available facilities, steps 5-9, can be performed in the field, before specimens reach the depository. Alternatively, steps 5-9 should be performed immediately on returning from the field. There is no excuse for newly collected material not to be digitised before it is reaches the collection. Recent entomological collecting trips of NMS yielded 7358 specimens from 72 collecting events, fully documented and digitised in a matter of hours.


Author(s):  
David Shorthouse

Bionomia, https://bionomia.net previously called Bloodhound Tracker, was launched in August 2018 with the aim of illustrating the breadth and depth of expertise required to collect and identify natural history specimens represented in the Global Biodiversity Information Facility (GBIF). This required that specimens and people be uniquely identified and that a granular expression of actions (e.g. "collected", "identified") be adopted. The Darwin Core standard presently combines agents and their actions into the conflated terms recordedBy and identifiedBy whose values are typically unresolved and unlinked text strings. Bionomia consists of tools, web services, and a responsive website, which are all used to efficiently guide users to resolve and unequivocally link people to specimens via first-class actions collected or identified. It also shields users from the complexity of casting together and seamlessly integrating the services of four giant initiatives: ORCID, Wikidata, GBIF, and Zenodo. All of these initiatives are financially sustainable and well-used by many stakeholders, well-outside this narrow user-case. As a result, the links between person and specimen made by users of Bionomia are given every opportunity to persist, to represent credit for effort, and to flow into collection management systems as meaningful new entries. To date, 13M links between people and specimens have been made including 2M negative associations on 12.5M specimen records. These links were either made by the collectors themselves or by 84 people who have attributed specimen records to their peers, mentors and others they revere. Integration With ORCID and Wikidata People are identified in Bionomia through synchronization with ORCID and Wikidata by reusing their unique identifiers and drawing in their metadata. ORCID identifiers are used by living researchers to link their identites to their research outputs. ORCID services include OAuth2 pass-through authentication for use by developers and web services for programmatic access to its store of public profiles. These contain elements of metadata such as full name, aliases, keywords, countries, education, employment history, affiliations, and links to publications. Bionomia seeds its search directory of people by periodically querying ORCID for specific user-assigned keywords as well as directly though account creation via OAuth2 authentication. Deceased people are uniquely identified in Bionomia through integration with Wikidata by caching unique 'Q' numbers (identifiers), full names and aliases, countries, occupations, as well as birth and death dates. Profiles are seeded from Wikidata through daily queries for properties that are likely to be assigned to collectors of natural history specimens such as "Entomologists of the World ID" (= P5370) or "Harvard Index of Botanists ID" (= P6264). Because Wikidata items may be merged, Bionomia captures these merge events, re-associates previously made links to specimen records, and mirrors Wikidata's redirect behaviour. A Wikidata property called "Bionomia ID" (= P6944), whose values are either ORCID identifiers or Wikidata 'Q' numbers, helps facilitate additional integration and reuse. Integration with GBIF Specimen data are downloaded wholesale as Darwin Core Archives from GBIF every two weeks. The purpose of this schedule is to maintain a reasonable synchrony with source data that balances computation time with the expections of users who desire the most up-to-date view of their specimen records. Collectors with ORCID accounts who have elected to receive notice, are informed via email message when the authors of newly published papers have made use of their specimen records downloaded from GBIF. Integration with Zenodo Finally, users of Bionomia may integrate their ORCID OAuth2 authentication with Zenodo, an industry-recognized archive for research data, which enjoys support from the Conseil Européen pour la Recherche Nucléaire (CERN). At the user's request, their specimen data represented as CSV (comma-separated values) and JSON-LD (JavaScript Object Notation for Linked Data) documents are pushed into Zenodo, a DataCite DOI is assigned, and a formatted citation appears on their Bionomia profile. New versions of these files are pushed to Zenodo on the user's behalf when new specimen records are linked to them. If users have configured their ORCID account to listen for new entries in DataCite, a new work entry will also be made in their ORCID profile, thus sealing a perpetual, semi-automated loop betwen GBIF and ORCID that tidily showcases their efforts at collecting and identifying natural history specimens. Technologies Used Bionomia uses Apache Spark via scripts written in Scala, a human name parser written in Ruby called dwc_agent, queues of jobs executed through Sidekiq, scores of pairwise similarities in the structure of human names stored in Neo4j, data persistence in MySQL, and a search layer in Elasticsearch. Here, I expand on lessons learned in the construction and maintenance of Bionomia, emphasize the criticality of recognizing the early efforts made by a fledgling community of enthusiasts, and describe useful tools and services that may be integrated into collection management systems to help churn strings of unresolved, unlinked collector and determiner names into actionable identifiers that are gateways to rich sources of information.


2018 ◽  
Vol 2 ◽  
pp. e25579
Author(s):  
Falko Glöckler ◽  
Markus Englund

The DINA system (“DIgital information system for NAtural history data”, https://dina-project.net) consists of several web-based services that fulfill specific tasks. Most of the existing services are covering single core features in the collection management system and can be used either as integrated components in the DINA environment, or as stand-alone services. In this presentation single services will be highlighted as they represent technically interesting approaches and practical solutions for daily challenges in collection management, data curation and migration workflows. The focus will be on the following topics: (1) a generic reporting and label printing service, (2) practical decisions on taxonomic references in collection data and (3) the generic management and referencing of related research data and metadata: Reporting as presented in this context is defined as an extraction and subsequent compilation of information from the collection management system rather than just summarizing statistics. With this quite broad understanding of the term the DINA Reports & Labels Service (Museum für Naturkunde Berlin 2018) can assist in several different collection workflows such as generating labels, barcodes, specimen lists, vouchers, paper loan forms etc. As it is based on customizable HTML templates, it can be even used for creating customized web forms for any kind of interaction (e.g. annotations). Many collection management systems try to cope with taxonomic issues, because in practice taxonomy is used not only for determinations, but also for organizing the collections and categorizing storage units (e.g. “Coleoptera hall”). Addressing taxonomic challenges in a collection management system can slow down development and add complexity for the users. The DINA system uncouples these issues in a simple taxonomic service for the sole assignment of names to specimens, for example determinations. This draws a clear line between collection management and taxonomic research, of which the latter can be supported in a separate service. While the digitization of collection data and workflows proceeds, linking related data is essential for data management and enrichment. In many institutions research data is disconnected from the collection specimen data because the type and structure cannot be easily included in the collection management databases. With the DINA Generic Data Module (Museum für Naturkunde Berlin 2017) a service exists that allows for attaching any relational data structures to the DINA system. It can also be used as a standalone service that accommodates structured data within a DINA compliant interface for data management.


Author(s):  
Nelson Rios ◽  
Sharif Islam ◽  
James Macklin ◽  
Andrew Bentley

Technological innovations over the past two decades have given rise to the online availability of more than 150 million specimen and species-lot records from biological collections around the world through large-scale biodiversity data-aggregator networks. In the present landscape of biodiversity informatics, collections data are captured and managed locally in a wide variety of databases and collection management systems and then shared online as point-in-time Darwin Core archive snapshots. Data providers may publish periodic revisions to these data files, which are retrieved, processed and re-indexed by data aggregators. This workflow has resulted in data latencies and lags of months to years for some data providers. The Darwin Core Standard Wieczorek et al. (2012) provides guidelines for representing biodiversity information digitally, yet varying institutional practices and lack of interoperability between Collection Management Systems continue to limit semantic uniformity, particularly with regard to the actual content of data within each field. Although some initiatives have begun to link data elements, our ability to comprehensively link all of the extended data associated with a specimen, or related specimens, is still limited due to the low uptake and usage of persistent identifiers. The concept now under consideration is to create a Digital Extended Specimen (DES) that adheres to the tenets of Findable, Accessible, Interoperable and Reusable (FAIR) data management of stewardship principles and is the cumulative digital representation of all data, derivatives and products associated with a physical specimen, which are individually distinguished and linked by persistent identifiers on the Internet to create a web of knowledge. Biodiversity data aggregators that mobilize data across multiple institutions routinely perform data transformations in an attempt to provide a clean and consistent interpretation of the data. These aggregators are typically unable to interact directly with institutional data repositories, thereby limiting potentially fruitful opportunities for annotation, versioning, and repatriation. The ability to track such data transactions and satisfy the accompanying legal implications (e.g. Nagoya Protocol) is becoming a necessary component of data publication which existing standards do not adequately address. Furthermore, no mechanisms exist to assess the “trustworthiness” of data, critical to scientific integrity, reproducibility or to provide attribution metrics for collections to advocate for their contribution or effectiveness in supporting such research. Since the introduction of Darwin Core Archives Wieczorek et al. (2012) little has changed in the underlying mechanisms for publishing natural science collections data and we are now at a point where new innovations are required to meet current demand for continued digitization, access, research and management. One solution may involve changing the biodiversity data publication paradigm to one based on the atomized transactions relevant to each individual data record. These transactions, when summed over time, allows us us to realize the most recently accepted revision as well as historical and alternative perspectives. In order to realize the Digital Extended Specimen ideals and the linking of data elements, this transactional model combined with open and FAIR data protocols, application programming interfaces (APIs), repositories, and workflow engines can provide the building blocks for the next generation of natural science collections and biodiversity data infrastructures and services. These and other related topics have been the focus of phase 2 of the global consultation on converging Digital Specimens and Extended Specimens. Based on these discussions, this presentation will explore a conceptual solution leveraging elements from distributed version control, cryptographic ledgers and shared redundant storage to overcome many of the shortcomings of contemporary approaches.


Author(s):  
Falko Glöckler

Digital specimens (Hardisty 2018, Hardisty 2020) are the cyberspace equivalent to objects in a physical, often museum-based collection. They consist of references to data and metadata related to the collection object. Through the ongoing process of digitizing legacy data, gaining knowledge from new field collections or research, and annotating and linking to related resources, a digital specimen can evolve independently from the original physical object. Especially the provenance records cannot always be assigned to the physical object when the knowledge was gained solely from the digital representation. A physical specimen can also be understood as a physical preparation (or a set of multiple preparations, e.g. DNA samples taken from a preserved organism) accompanied by related digital and non-digital data sources (e.g. images, descriptions in fieldbooks, research data) rather than just a single object. This concept of an extended specimen has been described by Webster (2017) and is used in the initiative The Extended Specimen Network (Lendemer et al. 2019) to enhance the access and research potential of specimens. Digital specimens need to reflect both, eventual complexity of the physical object (extended specimen) and the knowledge gained from and linked to the digital object itself. In order to provide, track and make use of the digital specimens, the community of collection-holding institutions might need to think of digital specimens as standalone virtual collections that emanate from physical collections. Additionally, new versions of a digital specimen continuously derive from changes of the physical specimen as the (meta)data are being updated in collection management systems to document the state and treatment of the physical objects. Consequently, there is a challenge to enable the management of both: linked digital specimens in the World Wide Web and the local data of physical specimens in databases of collection-holding institutions and other tools and services. In this panel discussion, central questions about the requirements, obstacles and opportunities of implementing the concepts of digital specimens and extended specimens in software tools like collection management systems are discussed. The aim is to identify the major tasks and priorities regarding the transformation of tools and services from multiple perspectives: local collection data management, international data infrastructures like the Distributed System of Scientific Collections (DiSSCo) and the Global Biodiversity Information Facility (GBIF), and data usage outside of domain-specific subject areas.


2021 ◽  
Vol 35 (1) ◽  
pp. 1-20
Author(s):  
Breda M. Zimkus ◽  
Linda S. Ford ◽  
Paul J. Morris

Abstract A growing number of domestic and international legal issues are confronting biodiversity collections, which require immediate access to information documenting the legal aspects of specimen ownership and restrictions regarding use. The Nagoya Protocol, which entered into force in 2014, established a legal framework for access and benefit-sharing of genetic resources and has notable implications for collecting, researchers working with specimens, and biodiversity collections. Herein, we discuss how this international protocol mandates operating changes within US biodiversity collections. Given the new legal landscape, it is clear that digital solutions for tracking records at all stages of a specimen's life cycle are needed. We outline how the Harvard Museum of Comparative Zoology (MCZ) has made changes to its procedures and museum-wide database, MCZbase (an independent instance of the Arctos collections management system), linking legal compliance documentation to specimens and transactions (i.e., accessions, loans). We used permits, certificates, and agreements associated with MCZ specimens accessioned in 2018 as a means to assess a new module created to track compliance documentation, a controlled vocabulary categorizing these documents, and the automatic linkages established among documentation, specimens, and transactions. While the emphasis of this work was a single year test case, its successful implementation may be informative to policies and collection management systems at other institutions.


2018 ◽  
Vol 64 (No. 2) ◽  
pp. 61-73
Author(s):  
Zeman Karel ◽  
Hron Jan

The article’s objective, which is to identify the causes of the very poor level of management of the administration of these state assets, to present the possibilities of a long-tested experimental model at the Land Fund of the Czech Republic, and to draw attention to this model’s potential in its possible implementation in administration of the entire management complex of these state assets in the Czech Republic. The authors first dealt with the theoretical aspects of the given issue, and then prepared an analysis of the original debt collection management “system”. This is logically followed by an analysis of the experimental model’s efficiency, rounded off with its conclusions. The final chapter contains the results of the research aimed at the current level of knowledge of the examined issue in compressed form, further the outcomes of the research concerned with original unsystematic debt collection management, also results of the implementation of the experimental model, and the assessment of the significance belonging to the results of implementing the experimental model for the entire national economy of the Czech Republic.


Sign in / Sign up

Export Citation Format

Share Document