Generating Masks for Image Segmentation in Digitized Herbarium Specimens
Digitized herbarium images contain complex information unrelated to the shape and color of the specimens represented within them. This information can contribute a substantial amount of noise if one is to use the image as a proxy for pattern, shape, or color of the specimen. Image segmentation, whereby the specimen material is partitioned from the background (e.g., herbarium sheet, label, color ramp), offers one possible solution, yet training data for image segmentation of herbarium specimens is nonexistent. We present a pipeline for generating training data for image segmentation tasks along with a novel dataset of highly resolved image masks segmenting plant material from background noise. This dataset can be used to train neural networks to segment plant material in herbarium sheets more generally, and our method is applicable to other museum data sources where masking may be useful for quantitative analysis of patterns and shapes