Liberating the Richness of Facts implicit in taxonomic Publication: The Plazi Workflow

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59179 ◽

2020 ◽

Vol 4 ◽

Author(s):

Donat Agosti ◽

Marcus Guidoti ◽

Guido Sautter

Keyword(s):

Open Access ◽

Task Force ◽

Journal Article ◽

Biotic Interactions ◽

Third Party ◽

Data Consistency ◽

Digital Object Identifier ◽

Global Biodiversity Information Facility ◽

Data Set ◽

Service Contracts

The growing corpus of hundreds of millions of pages of taxonomic literature reporting research results based on specimens is very rich in facts. In order to make them reusable, Plazi, Pensoft and Zenodo are building and maintaining the Biodiversity Literature Repository which includes a workflow to discover, describe, store, in order to making these facts open access, findable, accesible, interoperable and reusable (FAIR). Currently, 43,000 articles have 406,000 material citations, and around 50% of annually new described species are made accessible and immediately reused by the Global Biodiversity Information Facility (GBIF). All the images are deposited at the Biodiversity Literature Repository (BLR), as well as the taxonomic treatments. For each of these deposits enriched metadata is added and a Digital Object Identifier (DOI) is minted. Through this process, Plazi is the single largest data set provider to GBIF and continues to provide ca. 45,000 unique taxonomic names at GBIF. The workflow is optimized for born digital portable data format (PDF) based publications, but other formats can also be ingested, including TaxPub, a taxonomy specific version of the Journal Article Tag Suit (JATS) XML. After ingestion, the PDF is readily converted to an open-access, proprietary format called Image Markup File (IMF). IMF is a compressed file format that consists of the enhanced information contained in the PDF, with figures and tables properly extracted. The IMFs are then housed at TreatmentBank, with associated exported files, including DwC-A for each parent article and their respective taxonomic treatments, XMLs of treatments and GBIF datasets of their parent articles. Taxonomic treatments, in addition to figures and the original PDFs, are also deposited on Zenodo, where a DOI is minted if none is already available. These Zenodo deposits include in the metadata links back to the different data and file formats, including the treatments XMLs, maintaining the system connected and up-to-date. Third-party players, like GBIF, Global Biotic Interactions (GloBI), Ocellus, OpenBiodiv and Synospecies are constantly fed by system hookups, which guarantees data consistency after further edits. The PDF-IMF conversion and data enhancement is possible due to Plazi’s open-source software called GoldenGate Imagine. Ingested XMLs, that are validated against the TaxPub scheme, follow a similar path into the system and the many third-party applications. This operation is supported by the Arcadia Fund as well by service contracts from publishers to disseminate their data. In addition, the workflow has been contributing treatments, images from numerous publications relevant to understanding the virus spillover as part of the CETAF COVID19 task force. In this lecture this workflow is described and explained, including the associated infrastructure and its ongoing changes and upcoming steps of development.

Download Full-text

The Standards behind the Scenes: Explaining data from the Plazi workflow

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59178 ◽

2020 ◽

Vol 4 ◽

Author(s):

Donat Agosti ◽

Marcus Guidoti ◽

Terry Catapano ◽

Alexandros Ioannidis-Pantopikos ◽

Guido Sautter

Keyword(s):

Task Force ◽

Journal Article ◽

Biotic Interactions ◽

Portable Document Format ◽

Biomedical Ontology ◽

Third Party ◽

Biological Interactions ◽

Global Biodiversity Information Facility ◽

Related Data ◽

Scholarly Publications

As part of the CETAF COVID19 task force, Plazi liberated taxonomic treatments, figures, observation records, biotic interactions, taxonomic names, and collection and specimen codes involving bats and viruses from scholarly publications with the intention to create open access, findable, accessible, interoperable and reusable data (FAIR). The data is accessible via TreatmentBank and the Biodiversity Literature Repository (BLR) and it is continually harvested and reused by the Global Biodiversity Information Facility (GBIF) and Global Biotic Interactions (GloBI). This data was processed, enhanced and liberated by the Plazi workflow, which involves a dedicated infrastructure including a desktop application (GoldenGate Imagine) that converts portable document format files (PDF) to a dedicated open compressed file format (Image Markup File (IMF)) that is responsible for the data enhancement. To enhance the data contained in the publications, including the biological interactions, a series of standards and vocabularies are used. To the exception of TaxPub, which is a taxonomic specific extension of the U.S. National Center for Biotechnology Information's (NCBI) Journal Article Tag Suite (JATS), all other used vocabulary were previously proposed. This goes along with Plazi’s mission to reuse standards unless they are not available. The following standards of vocabularies are used: Metadata Object Description Schema (MODS) to model article metadata information on Plazi’s XMLs; Darwin Core for taxonomic ranks and materials citation related data; Open Biological and Biomedical Ontology (OBO); Relations Ontology for biological interactions between organisms. The latter two are also used in the custom metadata in the Biodiversity Literature Repository at Zenodo. In this presentation we will provide an overview of the different types of data followed by the standards or vocabularies applied for every and each one of them and their parts. The goal is to provide the context on how the data liberated by Plazi is described, which is extensively reused by third-party applications such as GBIF or GloBI. The use of the standards allows fully automated, daily data ingests by GBIF.

Download Full-text

Draft Report of ICCA-Queen Mary Task Force on Third-Party Funding in International Arbitration

SSRN Electronic Journal ◽

10.2139/ssrn.3037668 ◽

2017 ◽

Cited By ~ 1

Author(s):

Stavros Brekoulakis ◽

William W. Park ◽

Catherine A. Rogers

Keyword(s):

Task Force ◽

Third Party ◽

International Arbitration ◽

Draft Report ◽

Party Funding

Download Full-text

News media attention in Climate Action: latent topics and open access

Scientometrics ◽

10.1007/s11192-021-04095-7 ◽

2021 ◽

Author(s):

Tahereh Dehdarirad ◽

Kalle Karlsson

Keyword(s):

Sustainable Development ◽

Regression Analysis ◽

Open Access ◽

News Media ◽

Latent Dirichlet Allocation ◽

Topic Modelling ◽

Sustainable Development Goal ◽

Data Set ◽

Development Goal ◽

Climate Action

AbstractIn this study we investigated whether open access could assist the broader dissemination of scientific research in Climate Action (Sustainable Development Goal 13) via news outlets. We did this by comparing (i) the share of open and non-open access documents in different Climate Action topics, and their news counts, and (ii) the mean of news counts for open access and non-open access documents. The data set of this study comprised 70,206 articles and reviews in Sustainable Development Goal 13, published during 2014–2018, retrieved from SciVal. The number of news mentions for each document was obtained from Altmetrics Details Page API using their DOIs, whereas the open access statuses were obtained using Unpaywall.org. The analysis in this paper was done using a combination of (Latent Dirichlet allocation) topic modelling, descriptive statistics, and regression analysis. The covariates included in the regression analysis were features related to authors, country, journal, institution, funding, readability, news source category and topic. Using topic modelling, we identified 10 topics, with topics 4 (meteorology) [21%], 5 (adaption, mitigation, and legislation) [18%] and 8 (ecosystems and biodiversity) [14%] accounting for 53% of the research in Sustainable Development Goal 13. Additionally, the results of regression analysis showed that while keeping all the variables constant in the model, open access papers in Climate Action had a news count advantage (8.8%) in comparison to non-open access papers. Our findings also showed that while a higher share of open access documents in topics such as topic 9 (Human vulnerability to risks) might not assist with its broader dissemination, in some others such as topic 5 (adaption, mitigation, and legislation), even a lower share of open access documents might accelerate its broad communication via news outlets.

Download Full-text

Writing for publication: Structure, form, content, and journal selection

Journal of Perioperative Practice ◽

10.1177/1750458921996249 ◽

2021 ◽

Vol 31 (6) ◽

pp. 230-233

Author(s):

Veronica Phillips ◽

Eleanor Barker

Keyword(s):

Open Access ◽

Scientific Journal ◽

Journal Article ◽

Structure Form ◽

Intended Audience ◽

Writing For Publication ◽

Research Article ◽

Factors Influencing ◽

Citation Metrics ◽

Journal Citation

This article provides an overview of writing for publication in peer-reviewed journals. While the main focus is on writing a research article, it also provides guidance on factors influencing journal selection, including journal scope, intended audience for the findings, open access requirements, and journal citation metrics. Finally, it covers the standard content of a scientific journal article, providing general advice and guidance regarding the information researchers would typically include in their published papers.

Download Full-text

Indian Contribution to Open Access: A Study of e-LIS

Indian Journal of Information Sources and Services ◽

10.51983/ijiss.2019.9.2.620 ◽

2019 ◽

Vol 9 (2) ◽

pp. 72-75

Author(s):

Subhash Khode

Keyword(s):

Open Access ◽

Information Science ◽

Journal Article ◽

Library And Information Science ◽

Asian Countries ◽

Library And Information Studies ◽

Information Studies ◽

The World ◽

Indian Contribution

The concept of open access has been increased in recent years around the world and India is also contributing in open access movement actively. e-LIS is an international open repository in the field of library and information science established in 2003 and as of today e-LIS contains 21,123 various types of documents. The basic aim of this study is to provide an analysis of Indian contribution towards open access movement, particularly the documents submitted in the e-LIS. This study provides analysis of 1090 various types of documents submitted to e-LIS (Eprint for Library and Information Science) from India as on 30 January, 2019. It found that the position of India in terms of number of documents submitted in the e-LIS is first among Asian countries. The maximum documents (432) are submitted as” Journal Article (Print and Online)” and maximum documents (72) are published in 2006.The maximum numbers of submitted articles (35) were published in “Annals of Library and Information Studies”.

Download Full-text

Open access resources for genome-wide association mapping in rice

Nature Communications ◽

10.1038/ncomms10532 ◽

2016 ◽

Vol 7 (1) ◽

Cited By ~ 169

Author(s):

Susan R. McCouch ◽

Mark H. Wright ◽

Chih-Wei Tung ◽

Lyza G. Maron ◽

Kenneth L. McNally ◽

...

Keyword(s):

Open Access ◽

Association Mapping ◽

High Density ◽

Genome Wide Association ◽

Minor Effect ◽

Data Set ◽

Genome Wide ◽

Analytical Strategies ◽

Biological Interpretation ◽

Rice Improvement

Abstract Increasing food production is essential to meet the demands of a growing human population, with its rising income levels and nutritional expectations. To address the demand, plant breeders seek new sources of genetic variation to enhance the productivity, sustainability and resilience of crop varieties. Here we launch a high-resolution, open-access research platform to facilitate genome-wide association mapping in rice, a staple food crop. The platform provides an immortal collection of diverse germplasm, a high-density single-nucleotide polymorphism data set tailored for gene discovery, well-documented analytical strategies, and a suite of bioinformatics resources to facilitate biological interpretation. Using grain length, we demonstrate the power and resolution of our new high-density rice array, the accompanying genotypic data set, and an expanded diversity panel for detecting major and minor effect QTLs and subpopulation-specific alleles, with immediate implications for rice improvement.

Download Full-text

Adolescent Depression Screening: Not So Fast

Adolescent Psychiatry ◽

10.2174/2210676609666190617145102 ◽

2020 ◽

Vol 10 (1) ◽

pp. 59-69

Author(s):

Edmund C. Levin

Keyword(s):

Pharmaceutical Industry ◽

Adolescent Depression ◽

Conflicts Of Interest ◽

Task Force ◽

Third Party ◽

Screening Tools ◽

Depression Screening ◽

The Us ◽

National Organizations ◽

The Uk

Background: Screening adolescents for depression has recently been advocated by two major national organizations. However, this practice is not without controversy. Objective: To review diagnostic, clinical, and conflict of interest issues associated with the calls for routine depression screening in adolescents. Method: The evaluation of depression screening by the US Preventive Services Task Force is compared and contrasted with those of comparable agencies in the UK and Canada, and articles arguing for and against screening are reviewed. Internal pharmaceutical industry documents declassified through litigation are examined for conflicts of interest. A case is presented that illustrates the substantial diagnostic limitations of self-administered mental health screening tools. Discussion: The value of screening adolescents for psychiatric illness is questionable, as is the validity of the screening tools that have been developed for this purpose. Furthermore, many of those advocating depression screening are key opinion leaders, who are in effect acting as third-party advocates for the pharmaceutical industry. The evidence suggests that a commitment to marketing rather than to science is behind their recommendations, although their conflicts of interest are hidden in what seem to be impartial third-party recommendations.

Download Full-text

ASSESSING OPENSTREETMAP URBAN NETWORK OF ORAN CITY

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w8-249-2019 ◽

2019 ◽

Vol XLII-3/W8 ◽

pp. 249-252

Author(s):

B. Meguenni ◽

M. A. Hafid

Keyword(s):

Road Network ◽

Third Party ◽

Spatial Accuracy ◽

Data Set ◽

Urban Network ◽

Urban Networks ◽

Spatial Quality ◽

Gis Environment ◽

The City

Abstract. OpenStreetMap (OSM) uses the Open Database License, it is a collaborative project that collects a rich set of vector data provided by volunteers. It is a global collection of mapping data that can be used for a wide variety of purposes. Many third-party online maps are based on OpenStreetMap data. Currently, more and more large organizations are choosing OSM for their maps. In addition, the analysis of the spatial quality of the OSM data shows that particular care must be taken. However, there are several methods for assessing the quality of the OSM data by comparing the OSM to an authoritative dataset. In this context, it is essential to develop an automatic procedure to improve its spatial quality. This work proposes a quantitative method for comparing the quality of the OSM and an authoritative data set on urban networks in the city of Oran (Algeria). The procedure is based on python modules in a GIS environment and provides measurements of the spatial accuracy and completeness of the OSM road network. The method is applied to assess the quality of the Oran OSM road network data set through a comparison with the official Algerian dataset. The results show that the OSM's Algerian road network is very complete, but with low spatial accuracy.

Download Full-text

Genome-wide analyses reveal clustering in Cannabis cultivars: the ancient domestication trilogy of a panacea

10.7287/peerj.preprints.1553v2 ◽

2015 ◽

Cited By ~ 3

Author(s):

Philippe Henry

Keyword(s):

Genetic Variation ◽

Genetic Structure ◽

Open Access ◽

Nucleotide Polymorphisms ◽

Data Set ◽

Single Nucleotide ◽

Genome Wide ◽

Access Data ◽

Diagnostic Snps ◽

Shed Light

In the present research, I used an open access data set (Medicinal Genomics) consisting of nearly 200'000 genome-wide single nucleotide polymorphisms (SNPs) typed in 28 cannabis accessions to shed light on the plant's underlying genetic structure. Genome-wide loadings were used to sequentially cull less informative markers. The process involved reducing the number of SNPs to 100K, 10K, 1K, 100 until I identified a set of 42 highly informative SNPs that I present here. The two first principal components, encompass over 3/4 of the genetic variation present in the dataset (PCA1 = 48.6%, PCA2= 26.3%). This set of diagnostic SNPs is then used to identify clusters into which cannabis accession segregate. I identified three clear and consistent clusters; reflective of the ancient domestication trilogy of the genus Cannabis.

Download Full-text

Biodiversity Literature Repository: Building the customized FAIR repository by using custom metadata

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75147 ◽

2021 ◽

Vol 5 ◽

Author(s):

Alexandros Ioannidis-Pantopikos ◽

Donat Agosti

Keyword(s):

Particle Physics ◽

Building Blocks ◽

General Purpose ◽

Digital Object Identifier ◽

Global Biodiversity Information Facility ◽

Local Networks ◽

Representational State Transfer ◽

Processing Pipeline ◽

European Laboratory ◽

Machine Readable

In the landscape of general-purpose repositories, Zenodo was built at the European Laboratory for Particle Physics' (CERN) data center to facilitate the sharing and preservation of the long tail of research across all disciplines and scientific domains. Given Zenodo’s long tradition of making research artifacts FAIR (Findable, Accessible, Interoperable, and Reusable), there are still challenges in applying these principles effectively when serving the needs of specific research domains. Plazi’s biodiversity taxonomic literature processing pipeline liberates data from publications, making it FAIR via extensive metadata, the minting of a DataCite Digital Object Identifier (DOI), a licence and both human- and machine-readable output provided by Zenodo, and accessible via the Biodiversity Literature Repository community at Zenodo. The deposits (e.g., taxonomic treatments, figures) are an example of how local networks of information can be formally linked to explicit resources in a broader context of other platforms like GBIF (Global Biodiversity Information Facility). In the context of biodiversity taxonomic literature data workflows, a general-purpose repository’s traditional submission approach is not enough to preserve rich metadata and to capture highly interlinked objects, such as taxonomic treatments and digital specimens. As a prerequisite to serve these use cases and ensure that the artifacts remain FAIR, Zenodo introduced the concept of custom metadata, which allows enhancing submissions such as figures or taxonomic treatments (see as an example the treatment of Eurygyrus peloponnesius) with custom keywords, based on terms from common biodiversity vocabularies like Darwin Core and Audubon Core and with an explicit link to the respective vocabulary term. The aforementioned pipelines and features are designed to be served first and foremost using public Representational State Transfer Application Programming Interfaces (REST APIs) and open web technologies like webhooks. This approach allows researchers and platforms to integrate existing and new automated workflows into Zenodo and thus empowers research communities to create self-sustained cross-platform ecosystems. The BiCIKL project (Biodiversity Community Integrated Knowledge Library) exemplifies how repositories and tools can become building blocks for broader adoption of the FAIR principles. Starting with the above literature processing pipeline, the concepts of and resulting FAIR data, with a focus on the custom metadata used to enhance the deposits, will be explained.

Download Full-text