scholarly journals Ozymandias: a biodiversity knowledge graph

PeerJ ◽  
2019 ◽  
Vol 7 ◽  
pp. e6739 ◽  
Author(s):  
Roderic D.M. Page

Enormous quantities of biodiversity data are being made available online, but much of this data remains isolated in silos. One approach to breaking these silos is to map local, often database-specific identifiers to shared global identifiers. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space. Motivated by the 2018 GBIF Ebbe Nielsen Challenge I explore the feasibility of constructing a “biodiversity knowledge graph” for the Australian fauna. The data cleaning and reconciliation steps involved in constructing the knowledge graph are described in detail. Examples are given of its application to understanding changes in patterns of taxonomic publication over time. A web interface to the knowledge graph (called “Ozymandias”) is available at https://ozymandias-demo.herokuapp.com.

2018 ◽  
Author(s):  
Roderic D. M. Page

AbstractEnormous quantities of biodiversity data are being made available online, but much of this data remains isolated in their own silos. One approach to breaking these silos is to map local, often database-specific identifiers to shared global identifiers. This mapping can then be used to construct a knowledge graph, where entities such as taxa, publications, people, places, specimens, sequences, and institutions are all part of a single, shared knowledge space. Motivated by the 2018 GBIF Ebbe Nielsen Challenge I explore the feasibility of constructing a “biodiversity knowledge graph” for the Australian fauna. These steps involved in constructing the graph are described, and examples its application are discussed. A web interface to the knowledge graph (called “Ozymandias”) is available at https://ozymandias-demo.herokuapp.com.


Author(s):  
Lyubomir Penev ◽  
Teodor Georgiev ◽  
Viktor Senderov ◽  
Mariya Dimitrova ◽  
Pavel Stoev

As one of the first advocates of open access and open data in the field of biodiversity publishiing, Pensoft has adopted a multiple data publishing model, resulting in the ARPHA-BioDiv toolbox (Penev et al. 2017). ARPHA-BioDiv consists of several data publishing workflows and tools described in the Strategies and Guidelines for Publishing of Biodiversity Data and elsewhere: Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. Data underlying research results are deposited in an external repository and/or published as supplementary file(s) to the article and then linked/cited in the article text; supplementary files are published under their own DOIs and bear their own citation details. Data deposited in trusted repositories and/or supplementary files and described in data papers; data papers may be submitted in text format or converted into manuscripts from Ecological Metadata Language (EML) metadata. Integrated narrative and data publishing realised by the Biodiversity Data Journal, where structured data are imported into the article text from tables or via web services and downloaded/distributed from the published article. Data published in structured, semanticaly enriched, full-text XMLs, so that several data elements can thereafter easily be harvested by machines. Linked Open Data (LOD) extracted from literature, converted into interoperable RDF triples in accordance with the OpenBiodiv-O ontology (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph. The above mentioned approaches are supported by a whole ecosystem of additional workflows and tools, for example: (1) pre-publication data auditing, involving both human and machine data quality checks (workflow 2); (2) web-service integration with data repositories and data centres, such as Global Biodiversity Information Facility (GBIF), Barcode of Life Data Systems (BOLD), Integrated Digitized Biocollections (iDigBio), Data Observation Network for Earth (DataONE), Long Term Ecological Research (LTER), PlutoF, Dryad, and others (workflows 1,2); (3) semantic markup of the article texts in the TaxPub format facilitating further extraction, distribution and re-use of sub-article elements and data (workflows 3,4); (4) server-to-server import of specimen data from GBIF, BOLD, iDigBio and PlutoR into manuscript text (workflow 3); (5) automated conversion of EML metadata into data paper manuscripts (workflow 2); (6) export of Darwin Core Archive and automated deposition in GBIF (workflow 3); (7) submission of individual images and supplementary data under own DOIs to the Biodiversity Literature Repository, BLR (workflows 1-3); (8) conversion of key data elements from TaxPub articles and taxonomic treatments extracted by Plazi into RDF handled by OpenBiodiv (workflow 5). These approaches represent different aspects of the prospective scholarly publishing of biodiversity data, which in a combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, lay the ground of an entire data publishing ecosystem for biodiversity, supplying FAIR (Findable, Accessible, Interoperable and Reusable data to several interoperable overarching infrastructures, such as GBIF, BLR, Plazi TreatmentBank, OpenBiodiv and various end users.


2019 ◽  
Vol 5 ◽  
Author(s):  
Joel Sachs ◽  
Roderic Page ◽  
Steven J Baskauf ◽  
Jocelyn Pender ◽  
Beatriz Lujan-Toro ◽  
...  

Knowledge graphs have the potential to unite disconnected digitized biodiversity data, and there are a number of efforts underway to build biodiversity knowledge graphs. More generally, the recent popularity of knowledge graphs, driven in part by the advent and success of the Google Knowledge Graph, has breathed life into the ongoing development of semantic web infrastructure and prototypes in the biodiversity informatics community. We describe a one week training event and hackathon that focused on applying three specific knowledge graph technologies – the Neptune graph database; Metaphactory; and Wikidata - to a diverse set of biodiversity use cases.We give an overview of the training, the projects that were advanced throughout the week, and the critical discussions that emerged. We believe that the main barriers towards adoption of biodiversity knowledge graphs are the lack of understanding of knowledge graphs and the lack of adoption of shared unique identifiers. Furthermore, we believe an important advancement in the outlook of knowledge graph development is the emergence of Wikidata as an identifier broker and as a scoping tool. To remedy the current barriers towards biodiversity knowledge graph development, we recommend continued discussions at workshops and at conferences, which we expect to increase awareness and adoption of knowledge graph technologies.


2018 ◽  
Vol 2 ◽  
pp. e25564
Author(s):  
Tomer Gueta ◽  
Vijay Barve ◽  
Thiloshon Nagarajah ◽  
Ashwin Agrawal ◽  
Yohay Carmel

A new R package for biodiversity data cleaning, 'bdclean', was initiated in the Google Summer of Code (GSoC) 2017 and is available on github. Several R packages have great data validation and cleaning functions, but 'bdclean' provides features to manage a complete pipeline for biodiversity data cleaning; from data quality explorations, to cleaning procedures and reporting. Users are able go through the quality control process in a very structured, intuitive, and effective way. A modular approach to data cleaning functionality should make this package extensible for many biodiversity data cleaning needs. Under GSoC 2018, 'bdclean' will go through a comprehensive upgrade. New features will be highlighted in the demonstration.


2013 ◽  
pp. 1188-1203
Author(s):  
Ricardo Queirós ◽  
Mário Pinto

Recent studies of mobile Web trends show the continued explosion of mobile-friend content. However, the wide number and heterogeneity of mobile devices poses several challenges for Web programmers, who want automatic delivery of context and adaptation of the content to mobile devices. Hence, the device detection phase assumes an important role in this process. In this chapter, the authors compare the most used approaches for mobile device detection. Based on this study, they present an architecture for detecting and delivering uniform m-Learning content to students in a Higher School. The authors focus mainly on the XML device capabilities repository and on the REST API Web Service for dealing with device data. In the former, the authors detail the respective capabilities schema and present a new caching approach. In the latter, they present an extension of the current API for dealing with it. Finally, the authors validate their approach by presenting the overall data and statistics collected through the Google Analytics service, in order to better understand the adherence to the mobile Web interface, its evolution over time, and the main weaknesses.


2018 ◽  
Author(s):  
Donna M Gibbs ◽  
Charles J Gibbs ◽  
Jessica A Schultz

Since 2014 divers from Ocean Wise have been SCUBA diving in the Cambridge Bay, Nunavut area collecting data on fishes, invertebrates and marine plants at numerous sites. For each dive a file is created that catalogues the species found and a rough abundance of that species. These files accumulate over time and are searchable by location, year, year and month, month, species and a number of other criteria with custom software created for this purpose. Relationships between species is automatic with the searches. In addition to the species catalogue that began in 2014, data has been scrounged from previous collecting trips by staff and personal dive logs before 2014, allowing for comparison between Pond Inlet, Resolute and Cambridge Bay. We were able to flag a potential decline in one species in 2017 thanks to our previous data. Our goal is to work cooperatively with others diving in the Arctic to grow this database through photography and dive records. At this point we have 149 dives/records and 279 species recorded. The database is used to support the Nearshore Ecological Surveys and the Arctic Marine Ecological Benchmarking Program reports. In addition to biodiversity data, temperature, salinity, pH and dissolved oxygen are also collected while in the area.


Author(s):  
Eliana Alcaraz ◽  
Daniela Centrón ◽  
Gabriela Camicia ◽  
María Paula Quiroga ◽  
José Di Conza ◽  
...  

Introduction. Stenotrophomonas maltophilia has emerged as one of the most common multi-drug-resistant pathogens isolated from people with cystic fibrosis (CF). However, its adaptation over time to CF lungs has not been fully established. Hypothesis. Sequential isolates of S. maltophilia from a Brazilian adult patient are clonally related and show a pattern of adaptation by loss of virulence factors. Aim. To investigate antimicrobial susceptibility, clonal relatedness, mutation frequency, quorum sensing (QS) and selected virulence factors in sequential S. maltophilia isolates from a Brazilian adult patient attending a CF referral centre in Buenos Aires, Argentina, between May 2014 and May 2018. Methodology. The antibiotic resistance of 11 S. maltophilia isolates recovered from expectorations of an adult female with CF was determined. Clonal relatedness, mutation frequency, QS variants (RpfC–RpfF), QS autoinducer (DSF) and virulence factors were investigated in eight viable isolates. Results. Seven S. maltophilia isolates were resistant to trimethoprim–sulfamethoxazole and five to levofloxacin. All isolates were susceptible to minocycline. Strong, weak and normomutators were detected, with a tendency to decreased mutation rate over time. XbaI PFGE revealed that seven isolates belong to two related clones. All isolates were RpfC–RpfF1 variants and DSF producers. Only two isolates produced weak biofilms, but none displayed swimming or twitching motility. Four isolates showed proteolytic activity and amplified stmPr1 and stmPr2 genes. Only the first three isolates were siderophore producers. Four isolates showed high resistance to oxidative stress, while the last four showed moderate resistance. Conclusion. The present study shows the long-time persistence of two related S. maltophilia clones in an adult female with CF. During the adaptation of the prevalent clones to the CF lungs over time, we identified a gradual loss of virulence factors that could be associated with the high amounts of DSF produced by the evolved isolates. Further, a decreased mutation rate was observed in the late isolates. The role of all these adaptations over time remains to be elucidated from a clinical perspective, probably focusing on the damage they can cause to CF lungs.


2019 ◽  
Vol 2 ◽  
Author(s):  
Lyubomir Penev

"Data ownership" is actually an oxymoron, because there could not be a copyright (ownership) on facts or ideas, hence no data onwership rights and law exist. The term refers to various kinds of data protection instruments: Intellectual Property Rights (IPR) (mostly copyright) asserted to indicate some kind of data ownership, confidentiality clauses/rules, database right protection (in the European Union only), or personal data protection (GDPR) (Scassa 2018). Data protection is often realised via different mechanisms of "data hoarding", that is witholding access to data for various reasons (Sieber 1989). Data hoarding, however, does not put the data into someone's ownership. Nonetheless, the access to and the re-use of data, and biodiversuty data in particular, is hampered by technical, economic, sociological, legal and other factors, although there should be no formal legal provisions related to copyright that may prevent anyone who needs to use them (Egloff et al. 2014, Egloff et al. 2017, see also the Bouchout Declaration). One of the best ways to provide access to data is to publish these so that the data creators and holders are credited for their efforts. As one of the pioneers in biodiversity data publishing, Pensoft has adopted a multiple-approach data publishing model, resulting in the ARPHA-BioDiv toolbox and in extensive Strategies and Guidelines for Publishing of Biodiversity Data (Penev et al. 2017a, Penev et al. 2017b). ARPHA-BioDiv consists of several data publishing workflows: Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph Deposition of underlying data in an external repository and/or its publication as supplementary file(s) to the related article which are then linked and/or cited in-tex. Supplementary files are published under their own DOIs to increase citability). Description of data in data papers after they have been deposited in trusted repositories and/or as supplementary files; the systme allows for data papers to be submitted both as plain text or converted into manuscripts from Ecological Metadata Language (EML) metadata. Import of structured data into the article text from tables or via web services and their susequent download/distribution from the published article as part of the integrated narrative and data publishing workflow realised by the Biodiversity Data Journal. Publication of data in structured, semanticaly enriched, full-text XMLs where data elements are machine-readable and easy-to-harvest. Extraction of Linked Open Data (LOD) from literature, which is then converted into interoperable RDF triples (in accordance with the OpenBiodiv-O ontology) (Senderov et al. 2018) and stored in the OpenBiodiv Biodiversity Knowledge Graph In combination with text and data mining (TDM) technologies for legacy literature (PDF) developed by Plazi, these approaches show different angles to the future of biodiversity data publishing and, lay the foundations of an entire data publishing ecosystem in the field, while also supplying FAIR (Findable, Accessible, Interoperable and Reusable) data to several interoperable overarching infrastructures, such as Global Biodiversity Information Facility (GBIF), Biodiversity Literature Repository (BLR), Plazi TreatmentBank, OpenBiodiv, as well as to various end users.


2020 ◽  
Vol 10 (5) ◽  
pp. 1521-1539 ◽  
Author(s):  
Daniel R. McHugh ◽  
Elena Koumis ◽  
Paul Jacob ◽  
Jennifer Goldfarb ◽  
Michelle Schlaubitz-Garcia ◽  
...  

Aging is accompanied by a progressive decline in immune function termed “immunosenescence”. Deficient surveillance coupled with the impaired function of immune cells compromises host defense in older animals. The dynamic activity of regulatory modules that control immunity appears to underlie age-dependent modifications to the immune system. In the roundworm Caenorhabditis elegans levels of PMK-1 p38 MAP kinase diminish over time, reducing the expression of immune effectors that clear bacterial pathogens. Along with the PMK-1 pathway, innate immunity in C. elegans is regulated by the insulin signaling pathway. Here we asked whether DAF-16, a Forkhead box (FOXO) transcription factor whose activity is inhibited by insulin signaling, plays a role in host defense later in life. While in younger C. elegansDAF-16 is inactive unless stimulated by environmental insults, we found that even in the absence of acute stress the transcriptional activity of DAF-16 increases in an age-dependent manner. Beginning in the reproductive phase of adulthood, DAF-16 upregulates a subset of its transcriptional targets, including genes required to kill ingested microbes. Accordingly, DAF-16 has little to no role in larval immunity, but functions specifically during adulthood to confer resistance to bacterial pathogens. We found that DAF-16-mediated immunity in adults requires SMK-1, a regulatory subunit of the PP4 protein phosphatase complex. Our data suggest that as the function of one branch of the innate immune system of C. elegans (PMK-1) declines over time, DAF-16-mediated immunity ramps up to become the predominant means of protecting adults from infection, thus reconfiguring immunity later in life.


Sign in / Sign up

Export Citation Format

Share Document