scholarly journals Liberating the Richness of Facts implicit in taxonomic Publication: The Plazi Workflow 

Author(s):  
Donat Agosti ◽  
Marcus Guidoti ◽  
Guido Sautter

The growing corpus of hundreds of millions of pages of taxonomic literature reporting research results based on specimens is very rich in facts. In order to make them reusable, Plazi, Pensoft and Zenodo are building and maintaining the Biodiversity Literature Repository which includes a workflow to discover, describe, store, in order to making these facts open access, findable, accesible, interoperable and reusable (FAIR). Currently, 43,000 articles have 406,000 material citations, and around 50% of annually new described species are made accessible and immediately reused by the Global Biodiversity Information Facility (GBIF). All the images are deposited at the Biodiversity Literature Repository (BLR), as well as the taxonomic treatments. For each of these deposits enriched metadata is added and a Digital Object Identifier (DOI) is minted. Through this process, Plazi is the single largest data set provider to GBIF and continues to provide ca. 45,000 unique taxonomic names at GBIF. The workflow is optimized for born digital portable data format (PDF) based publications, but other formats can also be ingested, including TaxPub, a taxonomy specific version of the Journal Article Tag Suit (JATS) XML. After ingestion, the PDF is readily converted to an open-access, proprietary format called Image Markup File (IMF). IMF is a compressed file format that consists of the enhanced information contained in the PDF, with figures and tables properly extracted. The IMFs are then housed at TreatmentBank, with associated exported files, including DwC-A for each parent article and their respective taxonomic treatments, XMLs of treatments and GBIF datasets of their parent articles. Taxonomic treatments, in addition to figures and the original PDFs, are also deposited on Zenodo, where a DOI is minted if none is already available. These Zenodo deposits include in the metadata links back to the different data and file formats, including the treatments XMLs, maintaining the system connected and up-to-date. Third-party players, like GBIF, Global Biotic Interactions (GloBI), Ocellus, OpenBiodiv and Synospecies are constantly fed by system hookups, which guarantees data consistency after further edits. The PDF-IMF conversion and data enhancement is possible due to Plazi’s open-source software called GoldenGate Imagine. Ingested XMLs, that are validated against the TaxPub scheme, follow a similar path into the system and the many third-party applications. This operation is supported by the Arcadia Fund as well by service contracts from publishers to disseminate their data. In addition, the workflow has been contributing treatments, images from numerous publications relevant to understanding the virus spillover as part of the CETAF COVID19 task force. In this lecture this workflow is described and explained, including the associated infrastructure and its ongoing changes and upcoming steps of development.

Author(s):  
Donat Agosti ◽  
Marcus Guidoti ◽  
Terry Catapano ◽  
Alexandros Ioannidis-Pantopikos ◽  
Guido Sautter

As part of the CETAF COVID19 task force, Plazi liberated taxonomic treatments, figures, observation records, biotic interactions, taxonomic names, and collection and specimen codes involving bats and viruses from scholarly publications with the intention to create open access, findable, accessible, interoperable and reusable data (FAIR). The data is accessible via TreatmentBank and the Biodiversity Literature Repository (BLR) and it is continually harvested and reused by the Global Biodiversity Information Facility (GBIF) and Global Biotic Interactions (GloBI). This data was processed, enhanced and liberated by the Plazi workflow, which involves a dedicated infrastructure including a desktop application (GoldenGate Imagine) that converts portable document format files (PDF) to a dedicated open compressed file format (Image Markup File (IMF)) that is responsible for the data enhancement. To enhance the data contained in the publications, including the biological interactions, a series of standards and vocabularies are used. To the exception of TaxPub, which is a taxonomic specific extension of the U.S. National Center for Biotechnology Information's (NCBI) Journal Article Tag Suite (JATS), all other used vocabulary were previously proposed. This goes along with Plazi’s mission to reuse standards unless they are not available. The following standards of vocabularies are used: Metadata Object Description Schema (MODS) to model article metadata information on Plazi’s XMLs; Darwin Core for taxonomic ranks and materials citation related data; Open Biological and Biomedical Ontology (OBO); Relations Ontology for biological interactions between organisms. The latter two are also used in the custom metadata in the Biodiversity Literature Repository at Zenodo. In this presentation we will provide an overview of the different types of data followed by the standards or vocabularies applied for every and each one of them and their parts. The goal is to provide the context on how the data liberated by Plazi is described, which is extensively reused by third-party applications such as GBIF or GloBI. The use of the standards allows fully automated, daily data ingests by GBIF.


2021 ◽  
Author(s):  
Tahereh Dehdarirad ◽  
Kalle Karlsson

AbstractIn this study we investigated whether open access could assist the broader dissemination of scientific research in Climate Action (Sustainable Development Goal 13) via news outlets. We did this by comparing (i) the share of open and non-open access documents in different Climate Action topics, and their news counts, and (ii) the mean of news counts for open access and non-open access documents. The data set of this study comprised 70,206 articles and reviews in Sustainable Development Goal 13, published during 2014–2018, retrieved from SciVal. The number of news mentions for each document was obtained from Altmetrics Details Page API using their DOIs, whereas the open access statuses were obtained using Unpaywall.org. The analysis in this paper was done using a combination of (Latent Dirichlet allocation) topic modelling, descriptive statistics, and regression analysis. The covariates included in the regression analysis were features related to authors, country, journal, institution, funding, readability, news source category and topic. Using topic modelling, we identified 10 topics, with topics 4 (meteorology) [21%], 5 (adaption, mitigation, and legislation) [18%] and 8 (ecosystems and biodiversity) [14%] accounting for 53% of the research in Sustainable Development Goal 13. Additionally, the results of regression analysis showed that while keeping all the variables constant in the model, open access papers in Climate Action had a news count advantage (8.8%) in comparison to non-open access papers. Our findings also showed that while a higher share of open access documents in topics such as topic 9 (Human vulnerability to risks) might not assist with its broader dissemination, in some others such as topic 5 (adaption, mitigation, and legislation), even a lower share of open access documents might accelerate its broad communication via news outlets.


2021 ◽  
Vol 31 (6) ◽  
pp. 230-233
Author(s):  
Veronica Phillips ◽  
Eleanor Barker

This article provides an overview of writing for publication in peer-reviewed journals. While the main focus is on writing a research article, it also provides guidance on factors influencing journal selection, including journal scope, intended audience for the findings, open access requirements, and journal citation metrics. Finally, it covers the standard content of a scientific journal article, providing general advice and guidance regarding the information researchers would typically include in their published papers.


2019 ◽  
Vol 9 (2) ◽  
pp. 72-75
Author(s):  
Subhash Khode

The concept of open access has been increased in recent years around the world and India is also contributing in open access movement actively. e-LIS is an international open repository in the field of library and information science established in 2003 and as of today e-LIS contains 21,123 various types of documents. The basic aim of this study is to provide an analysis of Indian contribution towards open access movement, particularly the documents submitted in the e-LIS. This study provides analysis of 1090 various types of documents submitted to e-LIS (Eprint for Library and Information Science) from India as on 30 January, 2019. It found that the position of India in terms of number of documents submitted in the e-LIS is first among Asian countries. The maximum documents (432) are submitted as” Journal Article (Print and Online)” and maximum documents (72) are published in 2006.The maximum numbers of submitted articles (35) were published in “Annals of Library and Information Studies”.


2016 ◽  
Vol 7 (1) ◽  
Author(s):  
Susan R. McCouch ◽  
Mark H. Wright ◽  
Chih-Wei Tung ◽  
Lyza G. Maron ◽  
Kenneth L. McNally ◽  
...  

Abstract Increasing food production is essential to meet the demands of a growing human population, with its rising income levels and nutritional expectations. To address the demand, plant breeders seek new sources of genetic variation to enhance the productivity, sustainability and resilience of crop varieties. Here we launch a high-resolution, open-access research platform to facilitate genome-wide association mapping in rice, a staple food crop. The platform provides an immortal collection of diverse germplasm, a high-density single-nucleotide polymorphism data set tailored for gene discovery, well-documented analytical strategies, and a suite of bioinformatics resources to facilitate biological interpretation. Using grain length, we demonstrate the power and resolution of our new high-density rice array, the accompanying genotypic data set, and an expanded diversity panel for detecting major and minor effect QTLs and subpopulation-specific alleles, with immediate implications for rice improvement.


2020 ◽  
Vol 10 (1) ◽  
pp. 59-69
Author(s):  
Edmund C. Levin

Background: Screening adolescents for depression has recently been advocated by two major national organizations. However, this practice is not without controversy. Objective: To review diagnostic, clinical, and conflict of interest issues associated with the calls for routine depression screening in adolescents. Method: The evaluation of depression screening by the US Preventive Services Task Force is compared and contrasted with those of comparable agencies in the UK and Canada, and articles arguing for and against screening are reviewed. Internal pharmaceutical industry documents declassified through litigation are examined for conflicts of interest. A case is presented that illustrates the substantial diagnostic limitations of self-administered mental health screening tools. Discussion: The value of screening adolescents for psychiatric illness is questionable, as is the validity of the screening tools that have been developed for this purpose. Furthermore, many of those advocating depression screening are key opinion leaders, who are in effect acting as third-party advocates for the pharmaceutical industry. The evidence suggests that a commitment to marketing rather than to science is behind their recommendations, although their conflicts of interest are hidden in what seem to be impartial third-party recommendations.


Author(s):  
B. Meguenni ◽  
M. A. Hafid

<p><strong>Abstract.</strong> OpenStreetMap (OSM) uses the Open Database License, it is a collaborative project that collects a rich set of vector data provided by volunteers. It is a global collection of mapping data that can be used for a wide variety of purposes. Many third-party online maps are based on OpenStreetMap data. Currently, more and more large organizations are choosing OSM for their maps.</p> <p>In addition, the analysis of the spatial quality of the OSM data shows that particular care must be taken. However, there are several methods for assessing the quality of the OSM data by comparing the OSM to an authoritative dataset. In this context, it is essential to develop an automatic procedure to improve its spatial quality.</p> <p>This work proposes a quantitative method for comparing the quality of the OSM and an authoritative data set on urban networks in the city of Oran (Algeria). The procedure is based on python modules in a GIS environment and provides measurements of the spatial accuracy and completeness of the OSM road network. The method is applied to assess the quality of the Oran OSM road network data set through a comparison with the official Algerian dataset. The results show that the OSM's Algerian road network is very complete, but with low spatial accuracy.</p>


Author(s):  
Philippe Henry

In the present research, I used an open access data set (Medicinal Genomics) consisting of nearly 200'000 genome-wide single nucleotide polymorphisms (SNPs) typed in 28 cannabis accessions to shed light on the plant's underlying genetic structure. Genome-wide loadings were used to sequentially cull less informative markers. The process involved reducing the number of SNPs to 100K, 10K, 1K, 100 until I identified a set of 42 highly informative SNPs that I present here. The two first principal components, encompass over 3/4 of the genetic variation present in the dataset (PCA1 = 48.6%, PCA2= 26.3%). This set of diagnostic SNPs is then used to identify clusters into which cannabis accession segregate. I identified three clear and consistent clusters; reflective of the ancient domestication trilogy of the genus Cannabis.


Author(s):  
Alexandros Ioannidis-Pantopikos ◽  
Donat Agosti

In the landscape of general-purpose repositories, Zenodo was built at the European Laboratory for Particle Physics' (CERN) data center to facilitate the sharing and preservation of the long tail of research across all disciplines and scientific domains. Given Zenodo’s long tradition of making research artifacts FAIR (Findable, Accessible, Interoperable, and Reusable), there are still challenges in applying these principles effectively when serving the needs of specific research domains. Plazi’s biodiversity taxonomic literature processing pipeline liberates data from publications, making it FAIR via extensive metadata, the minting of a DataCite Digital Object Identifier (DOI), a licence and both human- and machine-readable output provided by Zenodo, and accessible via the Biodiversity Literature Repository community at Zenodo. The deposits (e.g., taxonomic treatments, figures) are an example of how local networks of information can be formally linked to explicit resources in a broader context of other platforms like GBIF (Global Biodiversity Information Facility). In the context of biodiversity taxonomic literature data workflows, a general-purpose repository’s traditional submission approach is not enough to preserve rich metadata and to capture highly interlinked objects, such as taxonomic treatments and digital specimens. As a prerequisite to serve these use cases and ensure that the artifacts remain FAIR, Zenodo introduced the concept of custom metadata, which allows enhancing submissions such as figures or taxonomic treatments (see as an example the treatment of Eurygyrus peloponnesius) with custom keywords, based on terms from common biodiversity vocabularies like Darwin Core and Audubon Core and with an explicit link to the respective vocabulary term. The aforementioned pipelines and features are designed to be served first and foremost using public Representational State Transfer Application Programming Interfaces (REST APIs) and open web technologies like webhooks. This approach allows researchers and platforms to integrate existing and new automated workflows into Zenodo and thus empowers research communities to create self-sustained cross-platform ecosystems. The BiCIKL project (Biodiversity Community Integrated Knowledge Library) exemplifies how repositories and tools can become building blocks for broader adoption of the FAIR principles. Starting with the above literature processing pipeline, the concepts of and resulting FAIR data, with a focus on the custom metadata used to enhance the deposits, will be explained.


Sign in / Sign up

Export Citation Format

Share Document