scholarly journals Utilising the Crowd to Unlock the Data on Herbarium Specimens at the Royal Botanic Garden Edinburgh

Author(s):  
Sally King ◽  
Juliette Pinon ◽  
Robyn Drinkwater

Digitisation of specimens at the Royal Botanic Garden Edinburgh (RBGE) has created nearly half a million imaged specimens. With data entry from the specimen labels on herbarium sheets identified as the rate-limiting step in the digitisation workflow, the majority of specimens are databased with minimal data (filing name and geographical region), leaving a need to add further label data (collector, collecting locality, collection date etc.) to make the specimens research ready. We are exploring a number of different ways to complete data entry for specimens that have been imaged. These have included Optical Character Recognition (OCR), to identify meaningful specimen groupings to increase the speed of data entry and more recently citizen science platforms to provide accurate crowd-sourced transcriptions of specimen label data. We sent specimen images of the Australian flowering plants held at RBGE herbarium to DigiVol (https://volunteer.ala.org.au/institution/index/21309224), the citizen science platform developed alongside The Atlas of Living Australia. In 29 expeditions, 156 citizen scientists completed collection label data entry for RBGE’s 41,000 specimens of Australian flowering plants. We found that 95% of the transcriptions were completed by less than a third (27%) of the volunteers. Of the four volunteer experience levels in DigiVol we found that the middle two, Collection Managers and Scientists, transcribed fewer specimens, but also made fewer mistakes. We found that by removing the filing name from the information provided with the expedition the number of errors in the Museum Details section of the transcription decreased, as the filing name was often added as the label name, regardless of whether this is the case. The feedback we provided for each expedition was used to highlight common errors to try and reduce their occurrence as well as to inform the volunteers of what their transcriptions had revealed about this part of the collection. We explore the citizen science transcription workflow, its rate-limiting steps and how we have worked to include the citizen science and OCR data on our online herbarium catalogue.

2014 ◽  
Vol 71 (3) ◽  
pp. 385-406
Author(s):  
D. W. Braidwood ◽  
V. Morales ◽  
M. F. Gardner

The Erich Werdermann collection ‘Plantae Chilenses’ held at the Royal Botanic Garden Edinburgh constitutes an important set of herbarium specimens from the Chilean flora, and represents over 10% of preserved specimens from Chile in the herbarium. Duplicate sets of specimens were distributed from the Botanischer Garten und Botanisches Museum Berlin-Dahlem to a further 15 major international herbaria. Here we provide a description of this collection, highlighting aspects of Werdermann’s journey in Chile. Included are his itinerary and maps showing where the specimens were collected. An important aspect of the paper is to clarify ambiguities concerning label data in order to provide more accurate detail for researchers using Werdermann’s specimens.


2018 ◽  
Vol 2 ◽  
pp. e25415
Author(s):  
Fabian Reimeier ◽  
Dominik Röpert ◽  
Anton Güntsch ◽  
Agnes Kirchhoff ◽  
Walter G. Berendsohn

On herbarium sheets, data elements such as plant name, collection site, collector, barcode and accession number are found mostly on labels glued to the sheet. The data are thus visible on specimen images. With continuously improving technologies for collection mass-digitisation it has become easier and easier to produce high quality images of herbarium sheets and in the last few years herbarium collections worldwide have started to digitize specimens on an industrial scale (Tegelberg et al. 2014). To use the label data contained in these massive numbers of images, they have to be captured and databased. Currently, manual data entry prevails and forms the principal cost and time limitation in the digitization process. The StanDAP-Herb Project has developed a standard process for (semi-) automatic detection of data on herbarium sheets. This is a formal extensible workflow integrating a wide range of automated specimen image analysis services, used to replace time-consuming manual data input as far as possible. We have created web-services for OCR (Optical Character Recognition); for identifying regions of interest in specimen images and for the context-sensitive extraction of information from text recognized by OCR. We implemented the workflow as an extension of the OpenRefine platform (Verborgh and De Wilde 2013).


Author(s):  
Gunnar Ovstebo

Spores sourced from historic herbarium specimens have been used to introduce wild-collected material to the Royal Botanic Garden Edinburgh (RBGE) living plant collection. The ability of dry habitat ferns to maintain spore viability for prolonged periods makes it possible to grow plants from the historically important RBGE herbarium collections. The factors that affect the ability of spores to germinate from herbarium collections are described. Three fern species from the Pteridaceae – Actiniopteris semiflabellata, Anogramma leptophylla and Aleuritopteris scioana – which were not previously in cultivation at RBGE were germinated from herbarium material of different ages. Germination was observed from all three species. Plants produced in this experiment were accessed into the RBGE living plant collection for future horticultural research and germination trials.


Radiocarbon ◽  
1983 ◽  
Vol 25 (2) ◽  
pp. 661-666 ◽  
Author(s):  
Steinar Gulliksen

Computer storage and surveys of large sets of data should be an attractive technique for users of 14C dates. Our pilot project demonstrates the effectiveness of a text retrieval system, NOVA STATUS. A small database comprising ca 100 dates, selected from results of the Trondheim 14C laboratory, is generated. Data entry to the computer is made by feeding typewritten forms through a document reader capable of optical character recognition. A text retrieval system allows data input to be in a flexible format. Program systems for text retrieval are in common use and easily implemented for a 14C database.


2015 ◽  
Vol 2 ◽  
pp. 1-19
Author(s):  
Gunnar Thorvaldsen ◽  
Joana Maria Pujadas-Mora ◽  
Trygve Andersen ◽  
Line Eikvil ◽  
Josep Lladós ◽  
...  

This article explains how two projects implement semi-automated transcription routines: for census sheets in Norway and marriage protocols from Barcelona. The Spanish system was created to transcribe the marriage license books from 1451 to 1905 for the Barcelona area; one of the world’s longest series of preserved vital records. Thus, in the Project “Five Centuries of Marriages” (5CofM) at the Autonomous University of Barcelona’s Center for Demographic Studies, the Barcelona Historical Marriage Database has been built. More than 600,000 records were transcribed by 150 transcribers working online. The Norwegian material is cross-sectional as it is the 1891 census, recorded on one sheet per person. This format and the underlining of keywords for several variables made it more feasible to semi-automate data entry than when many persons are listed on the same page. While Optical Character Recognition (OCR) for printed text is scientifically mature, computer vision research is now focused on more difficult problems such as handwriting recognition. In the marriage project, document analysis methods have been proposed to automatically recognize the marriage licenses. Fully automatic recognition is still a challenge, but some promising results have been obtained. In Spain, Norway and elsewhere the source material is available as scanned pictures on the Internet, opening up the possibility for further international cooperation concerning automating the transcription of historic source materials. Like what is being done in projects to digitize printed materials, the optimal solution is likely to be a combination of manual transcription and machine-assisted recognition also for hand-written sources.


1997 ◽  
Vol 9 (1-3) ◽  
pp. 1-16
Author(s):  
Tim Coles ◽  
Andrew Alexander ◽  
Gareth Shaw

Directories are a universal data source widely used in urban historical research. This paper reports on a series of experiments to explore the applicability of Optical Character Recognition (OCR) technology as a means of mass directory data entry.


As regards the collection of plants, totalling about 3000 numbers, most of the flowering plants and ferns have been identified by the staff of the Royal Botanic Garden at Kew, mainly by Mr Forman and Professor Holttum. We collected, whenever possible, ten to twelve duplicates and most of these are being distributed to the main herbaria of the world. There are still, nevertheless, many specimens which need monographic revision to establish their true identity. It is impossible, yet, to say how many are new. Professor J. L. Harrison, at the University of Singapore, is still at work on his account of the small mammals and their parasites. Mr Askew is at work on the soil samples. For my part, I have studied the fig collections, and there is nowhere in the world, that I know of, with such a rich fig flora as Kinabalu. It has 78 species (15 endemic), and our expedition discovered 2 new species and 4 new varieties, which fit neatly into gaps in the classification which I have been making. The fig insects are being studied by Dr Wiebes, at the National Museum in Leiden, in our joint effort to write the zoo-botany of Ficus . Already, Dr Wiebes has been able to publish a revision of the insect genus solen which inhabits Ficus sect. Sycocarpus ; he recognizes 32 species of which 23 are new, including 10 from our collections on Kinabalu. I am also at work on the fungi, which have to be collated with my earlier Malayan collections. This work, however, means almost monographic treatment of every group. With the great help of Dr Bas, at the National Herbarium in Leiden, an illustrated account of the genus Amanita in Malaya and Borneo has recently been published. We recognize 22 new species out of a total of 30, and this proportion shows the difficulty of pursuing mycology where there are so few names.


Author(s):  
Ann Bogaerts ◽  
Sofie De Smedt ◽  
Sofie Meeus ◽  
Quentin Groom

When researchers and managers are asked to rank the issues that prevent adequate control of invasive species, lack of public awareness is at the top of the list (Dehnen‐Schmutz et al. 2018). It is therefore imperative to raise the general public's awareness of the potential risks of introducing alien species into the wild. Green Pioneers, a citizen science project funded by the Flemish Government, (Fig. 1) aims to address this issue in Belgium, across age groups. The project aims to Create awareness on invasive species, highlighting how invasions can be avoided and how to mitigate their impact. Improve communication between citizens and scientists on conservation and invasive plant species. Augment the quality and quantity of data on invasive species. Create awareness on invasive species, highlighting how invasions can be avoided and how to mitigate their impact. Improve communication between citizens and scientists on conservation and invasive plant species. Augment the quality and quantity of data on invasive species. The project is developing three kinds of activity, specifically to attract a broad demographic: - 'Young Pioneers', by developing tools for teachers in science, technology, engineering and mathemathics for school children from 12-15 years old. - 'Online Pioneers', through our online citizen science platform DoeDat.be, by helping with the transcription of label information on herbarium specimens, - and 'Visiteers', by inviting companies and working age people to help us in the collection and to inform them about invasive species. Finally, we will be organizing a BioBlitz in spring 2020 at Meise Botanic Garden where we will celebrate plants and all our Green Pioneers, while also spreading the message of invasive plant awareness. During our 48 hour BioBlitz, scientists, volunteers and citizens are workig together to survey the biodiversity of our Botanic Garden. Ultimately, Green Pioneers aim to encourage recording of alien species by amateur botanists and create a generation of responsible gardeners who understand the consequences of releasing invasive alien plants into the wild.


Author(s):  
Laurence Livermore ◽  
Robert Cubey

Capturing data from specimen images is the most viable way of enriching specimen metadata cheaply and quickly compared to traditional digitisation. Advances in machine learning and computer vision-based tools, and their increasing accessibility and affordability, are greatly increasing the potential to take automated measurements and capture other data from specimens themselves, as well as to transcribe label data. More sophisticated segmentation of images allows us to find parts of interest: particular labels; individual specimens on a slide; or barcodes. Following segmentation, there is the potential to use colour analysis of specimens to perform conditional checking, such as looking for bad cases of verdigris in pinned insects or discoloration of gum-chloral mountant. Automating measurements and landmark analysis of specimens can be used to create trait datasets, all of which will enrich our knowledge of specimens. Segmentation of labels can allow us to cluster similar labels based on their visual properties including colour, shape and patterns—this in turn can be used to make optical character recognition, handwriting recognition and manual transcription much more efficient. Atomising, validating and resolving label data will create structured label data that can be more easily stored, searched and linked to other datasets. We present a landscape analysis on the approaches, summarising previous work, and outline our plan to build future tools and systems in the SYNTHESYS+ Project as part of the Specimen Data Refinery. This will cover the sharing of tools, reducing barriers to access, integrating workflow engines into a software architecture that allows the components to be re-used and re-purposed with provenance data for repeatability, and conforms with the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles (Wilkinson et al. 2016).


Sign in / Sign up

Export Citation Format

Share Document