Ozymandias: a biodiversity knowledge graph

PeerJ ◽

10.7717/peerj.6739 ◽

2019 ◽

Vol 7 ◽

pp. e6739 ◽

Cited By ~ 7

Author(s):

Roderic D.M. Page

Keyword(s):

Data Cleaning ◽

Shared Knowledge ◽

Knowledge Graph ◽

Web Interface ◽

Biodiversity Data ◽

Knowledge Space ◽

Link Type ◽

Biodiversity Knowledge ◽

Over Time

Enormous quantities of biodiversity data are being made available online, but much of this data remains isolated in silos. One approach to breaking these silos is to map local, often database-specific identifiers to shared global identifiers. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space. Motivated by the 2018 GBIF Ebbe Nielsen Challenge I explore the feasibility of constructing a “biodiversity knowledge graph” for the Australian fauna. The data cleaning and reconciliation steps involved in constructing the knowledge graph are described in detail. Examples are given of its application to understanding changes in patterns of taxonomic publication over time. A web interface to the knowledge graph (called “Ozymandias”) is available at https://ozymandias-demo.herokuapp.com.

Download Full-text

Ozymandias: A biodiversity knowledge graph

10.1101/485854 ◽

2018 ◽

Cited By ~ 3

Author(s):

Roderic D. M. Page

Keyword(s):

Shared Knowledge ◽

Knowledge Graph ◽

Web Interface ◽

Biodiversity Data ◽

Knowledge Space ◽

Link Type ◽

Biodiversity Knowledge

AbstractEnormous quantities of biodiversity data are being made available online, but much of this data remains isolated in their own silos. One approach to breaking these silos is to map local, often database-specific identifiers to shared global identifiers. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space. Motivated by the 2018 GBIF Ebbe Nielsen Challenge I explore the feasibility of constructing a “biodiversity knowledge graph” for the Australian fauna. These steps involved in constructing the graph are described, and examples its application are discussed. A web interface to the knowledge graph (called “Ozymandias”) is available at https://ozymandias-demo.herokuapp.com.

Download Full-text

The Pensoft Data Publishing Workflow: The FAIRway from articles to Linked Open Data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.35902 ◽

2019 ◽

Vol 3 ◽

Author(s):

Lyubomir Penev ◽

Teodor Georgiev ◽

Viktor Senderov ◽

Mariya Dimitrova ◽

Pavel Stoev

Keyword(s):

Open Data ◽

Structured Data ◽

Linked Open Data ◽

Data Publishing ◽

Knowledge Graph ◽

Supplementary File ◽

Biodiversity Data ◽

Text Format ◽

Biodiversity Knowledge ◽

Data Elements

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.

Download Full-text

Training and hackathon on building biodiversity knowledge graphs

Research Ideas and Outcomes ◽

10.3897/rio.5.e36152 ◽

2019 ◽

Vol 5 ◽

Cited By ~ 1

Author(s):

Joel Sachs ◽

Roderic Page ◽

Steven J Baskauf ◽

Jocelyn Pender ◽

Beatriz Lujan-Toro ◽

...

Keyword(s):

Knowledge Graph ◽

Graph Database ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Specific Knowledge ◽

Biodiversity Knowledge ◽

Training Event ◽

Web Infrastructure ◽

Ongoing Development ◽

Knowledge Graphs

Knowledge graphs have the potential to unite disconnected digitized biodiversity data, and there are a number of efforts underway to build biodiversity knowledge graphs. More generally, the recent popularity of knowledge graphs, driven in part by the advent and success of the Google Knowledge Graph, has breathed life into the ongoing development of semantic web infrastructure and prototypes in the biodiversity informatics community. We describe a one week training event and hackathon that focused on applying three specific knowledge graph technologies – the Neptune graph database; Metaphactory; and Wikidata - to a diverse set of biodiversity use cases.We give an overview of the training, the projects that were advanced throughout the week, and the critical discussions that emerged. We believe that the main barriers towards adoption of biodiversity knowledge graphs are the lack of understanding of knowledge graphs and the lack of adoption of shared unique identifiers. Furthermore, we believe an important advancement in the outlook of knowledge graph development is the emergence of Wikidata as an identifier broker and as a scoping tool. To remedy the current barriers towards biodiversity knowledge graph development, we recommend continued discussions at workshops and at conferences, which we expect to increase awareness and adoption of knowledge graph technologies.

Download Full-text

Introducing bdclean: a user friendly biodiversity data cleaning pipeline

Biodiversity Information Science and Standards ◽

10.3897/biss.2.25564 ◽

2018 ◽

Vol 2 ◽

pp. e25564

Author(s):

Tomer Gueta ◽

Vijay Barve ◽

Thiloshon Nagarajah ◽

Ashwin Agrawal ◽

Yohay Carmel

Keyword(s):

Data Cleaning ◽

Control Process ◽

R Package ◽

Modular Approach ◽

Data Validation ◽

Biodiversity Data ◽

Quality Control Process ◽

Cleaning Procedures ◽

R Packages ◽

User Friendly

A new R package for biodiversity data cleaning, 'bdclean', was initiated in the Google Summer of Code (GSoC) 2017 and is available on github. Several R packages have great data validation and cleaning functions, but 'bdclean' provides features to manage a complete pipeline for biodiversity data cleaning; from data quality explorations, to cleaning procedures and reporting. Users are able go through the quality control process in a very structured, intuitive, and effective way. A modular approach to data cleaning functionality should make this package extensible for many biodiversity data cleaning needs. Under GSoC 2018, 'bdclean' will go through a comprehensive upgrade. New features will be highlighted in the demonstration.

Download Full-text

Peer Review #1 of "Ozymandias: a biodiversity knowledge graph (v0.1)"

10.7287/peerj.6739v0.1/reviews/1 ◽

2019 ◽

Author(s):

CO Webb

Keyword(s):

Peer Review ◽

Knowledge Graph ◽

Biodiversity Knowledge

Download Full-text

Using Device Detection Techniques in M-Learning Scenarios

K-12 Education ◽

10.4018/978-1-4666-4502-8.ch069 ◽

2013 ◽

pp. 1188-1203

Author(s):

Ricardo Queirós ◽

Mário Pinto

Keyword(s):

Web Service ◽

Mobile Devices ◽

Mobile Device ◽

Web Interface ◽

Mobile Web ◽

Detection Techniques ◽

Learning Content ◽

Learning Scenarios ◽

Rest Api ◽

Over Time

Recent studies of mobile Web trends show the continued explosion of mobile-friend content. However, the wide number and heterogeneity of mobile devices poses several challenges for Web programmers, who want automatic delivery of context and adaptation of the content to mobile devices. Hence, the device detection phase assumes an important role in this process. In this chapter, the authors compare the most used approaches for mobile device detection. Based on this study, they present an architecture for detecting and delivering uniform m-Learning content to students in a Higher School. The authors focus mainly on the XML device capabilities repository and on the REST API Web Service for dealing with device data. In the former, the authors detail the respective capabilities schema and present a new caching approach. In the latter, they present an extension of the current API for dealing with it. Finally, the authors validate their approach by presenting the overall data and statistics collected through the Google Analytics service, in order to better understand the adherence to the mobile Web interface, its evolution over time, and the main weaknesses.

Download Full-text

Cataloguing and monitoring changes in Arctic marine biodiversity through SCUBA diving

10.7287/peerj.preprints.26850 ◽

2018 ◽

Author(s):

Donna M Gibbs ◽

Charles J Gibbs ◽

Jessica A Schultz

Keyword(s):

Dissolved Oxygen ◽

Scuba Diving ◽

The Arctic ◽

Marine Biodiversity ◽

Biodiversity Data ◽

Marine Plants ◽

Custom Software ◽

Over Time

Since 2014 divers from Ocean Wise have been SCUBA diving in the Cambridge Bay, Nunavut area collecting data on fishes, invertebrates and marine plants at numerous sites. For each dive a file is created that catalogues the species found and a rough abundance of that species. These files accumulate over time and are searchable by location, year, year and month, month, species and a number of other criteria with custom software created for this purpose. Relationships between species is automatic with the searches. In addition to the species catalogue that began in 2014, data has been scrounged from previous collecting trips by staff and personal dive logs before 2014, allowing for comparison between Pond Inlet, Resolute and Cambridge Bay. We were able to flag a potential decline in one species in 2017 thanks to our previous data. Our goal is to work cooperatively with others diving in the Arctic to grow this database through photography and dive records. At this point we have 149 dives/records and 279 species recorded. The database is used to support the Nearshore Ecological Surveys and the Arctic Marine Ecological Benchmarking Program reports. In addition to biodiversity data, temperature, salinity, pH and dissolved oxygen are also collected while in the area.

Download Full-text

Stenotrophomonas maltophilia phenotypic and genotypic features through 4-year cystic fibrosis lung colonization

Journal of Medical Microbiology ◽

10.1099/jmm.0.001281 ◽

2020 ◽

Author(s):

Eliana Alcaraz ◽

Daniela Centrón ◽

Gabriela Camicia ◽

María Paula Quiroga ◽

José Di Conza ◽

...

Keyword(s):

Cystic Fibrosis ◽

Adult Patient ◽

Virulence Factors ◽

Stenotrophomonas Maltophilia ◽

Type Species ◽

Mutation Frequency ◽

Content Type ◽

Link Type ◽

Clonal Relatedness ◽

Over Time

Introduction. Stenotrophomonas maltophilia has emerged as one of the most common multi-drug-resistant pathogens isolated from people with cystic fibrosis (CF). However, its adaptation over time to CF lungs has not been fully established. Hypothesis. Sequential isolates of S. maltophilia from a Brazilian adult patient are clonally related and show a pattern of adaptation by loss of virulence factors. Aim. To investigate antimicrobial susceptibility, clonal relatedness, mutation frequency, quorum sensing (QS) and selected virulence factors in sequential S. maltophilia isolates from a Brazilian adult patient attending a CF referral centre in Buenos Aires, Argentina, between May 2014 and May 2018. Methodology. The antibiotic resistance of 11 S. maltophilia isolates recovered from expectorations of an adult female with CF was determined. Clonal relatedness, mutation frequency, QS variants (RpfC–RpfF), QS autoinducer (DSF) and virulence factors were investigated in eight viable isolates. Results. Seven S. maltophilia isolates were resistant to trimethoprim–sulfamethoxazole and five to levofloxacin. All isolates were susceptible to minocycline. Strong, weak and normomutators were detected, with a tendency to decreased mutation rate over time. XbaI PFGE revealed that seven isolates belong to two related clones. All isolates were RpfC–RpfF1 variants and DSF producers. Only two isolates produced weak biofilms, but none displayed swimming or twitching motility. Four isolates showed proteolytic activity and amplified stmPr1 and stmPr2 genes. Only the first three isolates were siderophore producers. Four isolates showed high resistance to oxidative stress, while the last four showed moderate resistance. Conclusion. The present study shows the long-time persistence of two related S. maltophilia clones in an adult female with CF. During the adaptation of the prevalent clones to the CF lungs over time, we identified a gradual loss of virulence factors that could be associated with the high amounts of DSF produced by the evolved isolates. Further, a decreased mutation rate was observed in the late isolates. The role of all these adaptations over time remains to be elucidated from a clinical perspective, probably focusing on the damage they can cause to CF lungs.

Download Full-text

Data ownership and data publishing

ARPHA Conference Abstracts ◽

10.3897/aca.2.e39250 ◽

2019 ◽

Vol 2 ◽

Author(s):

Lyubomir Penev

Keyword(s):

Data Protection ◽

Open Data ◽

Data Publishing ◽

Supplementary File ◽

Biodiversity Data ◽

Biodiversity Knowledge ◽

Data Ownership ◽

Data Hoarding ◽

Data Elements ◽

Access To Data

"Data ownership" is actually an oxymoron, because there could not be a copyright (ownership) on facts or ideas, hence no data onwership rights and law exist. The term refers to various kinds of data protection instruments: Intellectual Property Rights (IPR) (mostly copyright) asserted to indicate some kind of data ownership, confidentiality clauses/rules, database right protection (in the European Union only), or personal data protection (GDPR) (Scassa 2018). Data protection is often realised via different mechanisms of "data hoarding", that is witholding access to data for various reasons (Sieber 1989). Data hoarding, however, does not put the data into someone's ownership. Nonetheless, the access to and the re-use of data, and biodiversuty data in particular, is hampered by technical, economic, sociological, legal and other factors, although there should be no formal legal provisions related to copyright that may prevent anyone who needs to use them (Egloff et al. 2014, Egloff et al. 2017, see also the Bouchout Declaration). One of the best ways to provide access to data is to publish these so that the data creators and holders are credited for their efforts. As one of the pioneers in biodiversity data publishing, Pensoft has adopted a multiple-approach data publishing model, resulting in the ARPHA-BioDiv toolbox and in extensive Strategies and Guidelines for Publishing of Biodiversity Data (Penev et al. 2017a, Penev et al. 2017b). ARPHA-BioDiv consists of several data publishing workflows: Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph In combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, these approaches show different angles to the future of biodiversity data publishing and, lay the foundations of an entire data publishing ecosystem in the field, while also supplying FAIR (Findable, Accessible, Interoperable and Reusable) data to several interoperable overarching infrastructures, such as Global Biodiversity Information Facility (GBIF), Biodiversity Literature Repository (BLR), Plazi TreatmentBank, OpenBiodiv, as well as to various end users.

Download Full-text

DAF-16 and SMK-1 Contribute to Innate Immunity During Adulthood in Caenorhabditis elegans

G3 Genes|Genome|Genetics ◽

10.1534/g3.120.401166 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1521-1539 ◽

Cited By ~ 1

Author(s):

Daniel R. McHugh ◽

Elena Koumis ◽

Paul Jacob ◽

Jennifer Goldfarb ◽

Michelle Schlaubitz-Garcia ◽

...

Keyword(s):

Innate Immunity ◽

Immune System ◽

Caenorhabditis Elegans ◽

Host Defense ◽

Insulin Signaling ◽

Bacterial Pathogens ◽

C Elegans ◽

Link Type ◽

Age Dependent ◽

Over Time

Aging is accompanied by a progressive decline in immune function termed “immunosenescence”. Deficient surveillance coupled with the impaired function of immune cells compromises host defense in older animals. The dynamic activity of regulatory modules that control immunity appears to underlie age-dependent modifications to the immune system. In the roundworm Caenorhabditis elegans levels of PMK-1 p38 MAP kinase diminish over time, reducing the expression of immune effectors that clear bacterial pathogens. Along with the PMK-1 pathway, innate immunity in C. elegans is regulated by the insulin signaling pathway. Here we asked whether DAF-16, a Forkhead box (FOXO) transcription factor whose activity is inhibited by insulin signaling, plays a role in host defense later in life. While in younger C. elegansDAF-16 is inactive unless stimulated by environmental insults, we found that even in the absence of acute stress the transcriptional activity of DAF-16 increases in an age-dependent manner. Beginning in the reproductive phase of adulthood, DAF-16 upregulates a subset of its transcriptional targets, including genes required to kill ingested microbes. Accordingly, DAF-16 has little to no role in larval immunity, but functions specifically during adulthood to confer resistance to bacterial pathogens. We found that DAF-16-mediated immunity in adults requires SMK-1, a regulatory subunit of the PP4 protein phosphatase complex. Our data suggest that as the function of one branch of the innate immune system of C. elegans (PMK-1) declines over time, DAF-16-mediated immunity ramps up to become the predominant means of protecting adults from infection, thus reconfiguring immunity later in life.

Download Full-text