Game of Tops: Trends in GBIF’s Community of Users

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37187 ◽

2019 ◽

Vol 3 ◽

Author(s):

Nora Escribano ◽

David Galicia ◽

Arturo H. Ariño

Keyword(s):

Information Exchange ◽

Full Range ◽

Open Data ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Research Areas ◽

Scientific Papers ◽

Opening Up ◽

Biodiversity Information

Building on the development of Biodiversity Informatics, the Global Biodiversity Information Facility (GBIF) undertook the task of enabling access to the world’s wealth of biodiversity data via the Internet. To date, GBIF has become, in many respects, the most extensive biodiversity information exchange infrastructure in the world, opening up a full range of possibilities for science. Science has benefited from such access to biodiversity data in research areas ranging from the effects of environmental change on biodiversity to the spread of invasive species, among many others. As of this writing, more than 7,000 published items (scientific papers, reviews, conference proceedings) have been indexed in the GBIF Secretariat’s literature tracking programme. On the basis on this database, we will represent trends in GBIF in the users’ behaviour over time regarding openness, social structure, and other features associated to such scientific production: what is the measurable impact of research using GBIF data? How is the GBIF community of users growing? Is the science made with, and enabled by, open data, actually open? Mapping GBIF users’ choices will show how biodiversity research is evolving through time, synthesising past and current priorities of this community in an attempt to forecast whether summer—or winter—is coming.

Download Full-text

Completeness of Digital Accessible Knowledge of the Plants of Ghana

Biodiversity Informatics ◽

10.17161/bi.v11i0.5860 ◽

2016 ◽

Vol 11 ◽

Cited By ~ 7

Author(s):

Alex Asase ◽

A. Townsend Peterson

Keyword(s):

Geographic Distance ◽

Northern Ghana ◽

Biodiversity Informatics ◽

Primary Research ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Herbarium Data ◽

Research Grade ◽

Biodiversity Information

Providing comprehensive, informative, primary, research-grade biodiversity information represents an important focus of biodiversity informatics initiatives. Recent efforts within Ghana have digitized >90% of primary biodiversity data records associated with specimen sheets in Ghanaian herbaria; additional herbarium data are available from other institutions via biodiversity informatics initiatives such as the Global Biodiversity Information Facility. However, data on the plants of Ghana have not as yet been integrated and assessed to establish how complete site inventories are, so that appropriate levels of confidence can be applied. In this study, we assessed inventory completeness and identified gaps in current Digital Accessible Knowledge (DAK) of the plants of Ghana, to prioritize areas for future surveys and inventories. We evaluated the completeness of inventories at ½° spatial resolution using statistics that summarize inventory completeness, and characterized gaps in coverage in terms of geographic distance and climatic difference from well-documented sites across the country. The southwestern and southeastern parts of the country held many well-known grid cells; the largest spatial gaps were found in central and northern parts of the country. Climatic difference showed contrasting patterns, with a dramatic gap in coverage in central-northern Ghana. This study provides a detailed case study of how to prioritize for new botanical surveys and inventories based on existing DAK.

Download Full-text

Data integration enables global biodiversity synthesis

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2018093118 ◽

2021 ◽

Vol 118 (6) ◽

pp. e2018093118

Author(s):

J. Mason Heberling ◽

Joseph T. Miller ◽

Daniel Noesgaard ◽

Scott B. Weingart ◽

Dmitry Schigel

Keyword(s):

Data Integration ◽

Species Interactions ◽

Large Scale ◽

Data Use ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Research Areas ◽

Global Biodiversity ◽

Biodiversity Information ◽

Global Data

The accessibility of global biodiversity information has surged in the past two decades, notably through widespread funding initiatives for museum specimen digitization and emergence of large-scale public participation in community science. Effective use of these data requires the integration of disconnected datasets, but the scientific impacts of consolidated biodiversity data networks have not yet been quantified. To determine whether data integration enables novel research, we carried out a quantitative text analysis and bibliographic synthesis of >4,000 studies published from 2003 to 2019 that use data mediated by the world’s largest biodiversity data network, the Global Biodiversity Information Facility (GBIF). Data available through GBIF increased 12-fold since 2007, a trend matched by global data use with roughly two publications using GBIF-mediated data per day in 2019. Data-use patterns were diverse by authorship, geographic extent, taxonomic group, and dataset type. Despite facilitating global authorship, legacies of colonial science remain. Studies involving species distribution modeling were most prevalent (31% of literature surveyed) but recently shifted in focus from theory to application. Topic prevalence was stable across the 17-y period for some research areas (e.g., macroecology), yet other topics proportionately declined (e.g., taxonomy) or increased (e.g., species interactions, disease). Although centered on biological subfields, GBIF-enabled research extends surprisingly across all major scientific disciplines. Biodiversity data mobilization through global data aggregation has enabled basic and applied research use at temporal, spatial, and taxonomic scales otherwise not possible, launching biodiversity sciences into a new era.

Download Full-text

Options to Apply the IGSN Model to Biodiversity Data

Biodiversity Information Science and Standards ◽

10.3897/biss.2.27087 ◽

2018 ◽

Vol 2 ◽

pp. e27087

Author(s):

Donald Hobern ◽

Andrea Hahn ◽

Tim Robertson

Keyword(s):

Open Data ◽

Herbarium Specimens ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Sample Number ◽

Standard Format ◽

Darwin Core ◽

Data Objects ◽

Treat All

For more than a decade, the biodiversity informatics community has recognised the importance of stable resolvable identifiers to enable unambiguous references to data objects and the associated concepts and entities, including museum/herbarium specimens and, more broadly, all records serving as evidence of species occurrence in time and space. Early efforts built on the Darwin Core institutionCode, collectionCode and catalogueNumber terms, treated as a triple and expected to uniquely to identify a specimen. Following review of current technologies for globally unique identifiers, TDWG adopted Life Science Identifiers (LSIDs) (Pereira et al. 2009). Unfortunately, the key stakeholders in the LSID consortium soon withdrew support for the technology, leaving TDWG committed to a moribund technology. Subsequently, publishers of biodiversity data have adopted a range of technologies to provide unique identifiers, including (among others) HTTP Universal Resource Identifiers (URIs), Universal Unique Identifiers (UUIDs), Archival Resource Keys (ARKs), and Handles. Each of these technologies has merit but they do not provide consistent guarantees of persistence or resolvability. More importantly, the heterogeneity of these solutions hampers delivery of services that can treat all of these data objects as part of a consistent linked-open-data domain. The geoscience community has established the System for Earth Sample Registration (SESAR) that enables collections to publish standard metadata records for their samples and for each of these to be associated with an International Geo Sample Number (IGSN http://www.geosamples.org/igsnabout). IGSNs follow a standard format, distribute responsibility for uniqueness between SESAR and the publishing collections, and support resolution via HTTP URI or Handles. Each IGSN resolves to a standard metadata page, roughly equivalent in detail to a Darwin Core specimen record. The standardisation of identifiers has allowed the community to secure support from some journal publishers for promotion and use of IGSNs within articles. The biodiversity informatics community encompasses a much larger number of publishers and greater pre-existing variation in identifier formats. Nevertheless, it would be possible to deliver a shared global identifier scheme with the same features as IGSNs by building off the aggregation services offered by the Global Biodiversity Information Facility (GBIF). The GBIF data index includes normalised Darwin Core metadata for all data records from registered data sources and could serve as a platform for resolution of HTTP URIs and/or Handles for all specimens and for all occurrence records. The most significant trade-off requiring consideration would be between autonomy for collections and other publishers in how they format identifiers within their own data and the benefits that may arise from greater consistency and predictability in the form of resolvable identifiers.

Download Full-text

DiSSCo, iDigBio and the Future of Global Collaboration

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37896 ◽

2019 ◽

Vol 3 ◽

Cited By ~ 1

Author(s):

Gil Nelson ◽

Deborah L Paul

Keyword(s):

Working Group ◽

Application Programming Interface ◽

The United States ◽

Common Source ◽

Data Generation ◽

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Data Accessibility ◽

The Us ◽

Biodiversity Information

Integrated Digitized Biocollections (iDigBio) is the United States’ (US) national resource and coordinating center for biodiversity specimen digitization and mobilization. It was established in 2011 through the US National Science Foundation’s (NSF) Advancing Digitization of Biodiversity Collections (ADBC) program, an initiative that grew from a working group of museum-based and other biocollections professionals working in concert with NSF to make collections' specimen data accessible for science, education, and public consumption. The working group, Network Integrated Biocollections Alliance (NIBA), released two reports (Beach et al. 2010, American Institute of Biological Sciences 2013) that provided the foundation for iDigBio and ADBC. iDigBio is restricted in focus to the ingestion of data generated by public, non-federal museum and academic collections. Its focus is on specimen-based (as opposed to observational) occurrence records. iDigBio currently serves about 118 million transcribed specimen-based records and 29 million specimen-based media records from approximately 1600 datasets. These digital objects have been contributed by about 700 collections representing nearly 400 institutions and is the most comprehensive biodiversity data aggregator in the US. Currently, iDigBio, DiSSCo (Distributed System of Scientific Collections), GBIF (Global Biodiversity Information Facility), and the Atlas of Living Australia (ALA) are collaborating on a global framework to harmonize technologies towards standardizing and synchronizing ingestion strategies, data models and standards, cyberinfrastructure, APIs (application programming interface), specimen record identifiers, etc. in service to a developing consolidated global data product that can provide a common source for the world’s digital biodiversity data. The collaboration strives to harness and combine the unique strengths of its partners in ways that ensure the individual needs of each partner’s constituencies are met, design pathways for accommodating existing and emerging aggregators, simultaneously strengthen and enhance access to the world’s biodiversity data, and underscore the scope and importance of worldwide biodiversity informatics activities. Collaborators will share technology strategies and outputs, align conceptual understandings, and establish and draw from an international knowledge base. These collaborators, along with Biodiversity Information Standards (TDWG), will join iDigBio and the Smithsonian National Museum of Natural History as they host Biodiversity 2020 in Washington, DC. Biodiversity 2020 will combine an international celebration of the worldwide progress made in biodiversity data accessibility in the 21st century with a biodiversity data conference that extends the life of Biodiversity Next. It will provide a venue for the GBIF governing board meeting, TDWG annual meeting, and the annual iDigBio Summit as well as three days of plenary and concurrent sessions focused on the present and future of biodiversity data generation, mobilization, and use.

Download Full-text

Diversity and Distribution Patterns of Geometrid Moths (Geometridae, Lepidoptera) in Mongolia

Diversity ◽

10.3390/d12050186 ◽

2020 ◽

Vol 12 (5) ◽

pp. 186

Author(s):

Khishigdelger Enkhtur ◽

Bazartseren Boldgiv ◽

Martin Pfeiffer

Keyword(s):

Species Richness ◽

Environmental Variables ◽

Environmental Changes ◽

Distribution Patterns ◽

Maximum Temperature ◽

Forest Steppe ◽

Global Biodiversity Information Facility ◽

Scientific Papers ◽

Geometrid Moth ◽

Biodiversity Information

Geometrids are a species-rich group of moths that serve as reliable indicators for environmental changes. Little is known about the Mongolian moth fauna, and there is no comprehensive review of species richness, diversity, and distribution patterns of geometrid moths in the country. Our study aims to review the existing knowledge on geometrid moths in Mongolia. We compiled geometrid moth records from published scientific papers, our own research, and from the Global Biodiversity Information Facility (GBIF) to produce a checklist of geometrid moths of Mongolia. Additionally, we analyzed spatial patterns, species richness, and diversity of geometrid moths within 14 ecoregions of Mongolia and evaluated environmental variables for their distribution. In total, we compiled 1973-point records of 388 geometrid species. The most species-rich ecoregion in Mongolia was Daurian Forest Steppe with 142 species. Annual precipitation and maximum temperature of the warmest month were the most important environmental variables that correlated with NMDS axes in an analysis of geometrid assemblages of different ecoregions in Mongolia.

Download Full-text

African Biodiversity Challenge: Integrating Freshwater Biodiversity Information to Guide Informed Decision-Making in Rwanda

Biodiversity Information Science and Standards ◽

10.3897/biss.2.26367 ◽

2018 ◽

Vol 2 ◽

pp. e26367

Author(s):

Yvette Umurungi ◽

Samuel Kanyamibwa ◽

Faustin Gashakamba ◽

Beth Kaplin

Keyword(s):

Decision Making ◽

Natural Resources ◽

Economic Transformation ◽

Freshwater Ecosystems ◽

Data Availability ◽

Biodiversity Informatics ◽

Freshwater Biodiversity ◽

Biodiversity Data ◽

Albertine Rift ◽

Biodiversity Information

Freshwater biodiversity is critically understudied in Rwanda, and to date there has not been an efficient mechanism to integrate freshwater biodiversity information or make it accessible to decision-makers, researchers, private sector or communities, where it is needed for planning, management and the implementation of the National Biodiversity Strategy and Action Plan (NBSAP). A framework to capture and distribute freshwater biodiversity data is crucial to understanding how economic transformation and environmental change is affecting freshwater biodiversity and resulting ecosystem services. To optimize conservation efforts for freshwater ecosystems, detailed information is needed regarding current and historical species distributions and abundances across the landscape. From these data, specific conservation concerns can be identified, analyzed and prioritized. The purpose of this project is to establish and implement a long-term strategy for freshwater biodiversity data mobilization, sharing, processing and reporting in Rwanda. The expected outcome of the project is to support the mandates of the Rwanda Environment Management Authority (REMA), the national agency in charge of environmental monitoring and the implementation of Rwanda’s NBSAP, and the Center of Excellence in Biodiversity and Natural Resources Management (CoEB). The project also aligns with the mission of the Albertine Rift Conservation Society (ARCOS) to enhance sustainable management of natural resources in the Albertine rift region. Specifically, organizational structure, technology platforms, and workflows for the biodiversity data capture and mobilization are enhanced to promote data availability and accessibility to improve Rwanda’s NBSAP and support other decision-making processes. The project is enhancing the capacity of technical staff from relevant government and non-government institutions in biodiversity informatics, strengthening the capacity of CoEB to achieve its mission as the Rwandan national biodiversity knowledge management center. Twelve institutions have been identified as data holders and the digitization of these data using Darwin Core standards is in progress, as well as data cleaning for the data publication through the ARCOS Biodiversity Information System (http://arbmis.arcosnetwork.org/). The release of the first national State of Freshwater Biodiversity Report is the next step. CoEB is a registered publisher to the Global Biodiversity Information Facility (GBIF) and holds an Integrated Publishing Toolkit (IPT) account on the ARCOS portal. This project was developed for the African Biodiversity Challenge, a competition coordinated by the South African National Biodiversity Institute (SANBI) and funded by the JRS Biodiversity Foundation which supports on-going efforts to enhance the biodiversity information management activities of the GBIF Africa network. This project also aligns with SANBI’s Regional Engagement Strategy, and endeavors to strengthen both emerging biodiversity informatics networks and data management capacity on the continent in support of sustainable development.

Download Full-text

Making Biodiversity Data Social, Shareable, and Scalable: Reflections on iNaturalist & citizen science

Biodiversity Information Science and Standards ◽

10.3897/biss.3.46670 ◽

2019 ◽

Vol 3 ◽

Author(s):

Carrie Seltzer

Keyword(s):

Social Interaction ◽

Citizen Science ◽

Biodiversity Informatics ◽

Strategic Decisions ◽

Biodiversity Data ◽

Helping Others ◽

Advance Research ◽

Biodiversity Information

Since 2008, iNaturalist has been crowdsourcing identifications for biodiversity observations collected by citizen scientists. Today iNaturalist has over 25 million records of wild biodiversity with photo or audio evidence, from every country, representing more than 230,000 species, collected by over 700,000 people, and with 90,000 people helping others with identifications. Hundreds of publications have used iNaturalist data to advance research, conservation, and policy. There are three key themes that iNaturalist has embraced: social interaction; shareability of data, tools, and code; and scalability of the platform and community. The keynote will share reflections on what has (and has not) worked for iNaturalist while drawing on other examples from biodiversity informatics and citizen science. Insights about user motivations, synergistic collaborations, and strategic decisions about scaling offer some transferable approaches to address the broadly applicable questions: Which species is represented? How do we make the best use of the available biodiversity information? And how do we build something viable and enduring in the process?

Download Full-text

Towards a Post-Graduate Level Curriculum for Biodiversity Informatics. Perspectives from the Global Biodiversity Information Facility (GBIF) Community

Biodiversity Data Journal ◽

10.3897/bdj.9.e68010 ◽

2021 ◽

Vol 9 ◽

Author(s):

Fatima Parker-Allie ◽

Francisco Pando ◽

Anders Telenius ◽

Jean Ganglo ◽

Danny Vélez ◽

...

Keyword(s):

Biological Data ◽

Initial Assessment ◽

Biodiversity Informatics ◽

Global Biodiversity Information Facility ◽

Policy Makers ◽

Academic Teaching ◽

E Learning ◽

Learning Platforms ◽

Global Biodiversity ◽

Biodiversity Information

Biodiversity informatics is a new and evolving field, requiring efforts to develop capacity and a curriculum for this field of science. The main objective was to summarise the level of activity and the efforts towards developing biodiversity informatics curricula, for work-based training and/or academic teaching at universities, taking place within the Global Biodiversity Information Facility (GBIF) countries and its associated network. A survey approach was used to identify existing capacities and resources within the network. Most of GBIF Nodes survey respondents (80%) are engaged in onsite training activities, with a focus on work-based professionals, mostly researchers, policy-makers and students. Training topics include data mobilisation, digitisation, management, publishing, analysis and use, to enable the accessibility of analogue and digital biological data that currently reside as scattered datasets. An initial assessment of academic teaching activities highlighted that countries in most regions, to varying degrees, were already engaged in the conceptualisation, development and/or implementation of formal academic programmes in biodiversity informatics, including programmes in Benin, Colombia, Costa Rica, Finland, France, India, Norway, South Africa, Sweden, Taiwan and Togo. Digital e-learning platforms were an important tool to help build capacity in many countries. In terms of the potential in the Nodes network, 60% expressed willingness to be recruited or commissioned for capacity enhancement purposes. Contributions and activities of various country nodes across the network have been highlighted and a working curriculum framework has been defined.

Download Full-text

BIDDSAT: visualizing the content of biodiversity data publishers in the Global Biodiversity Information Facility network

Bioinformatics ◽

10.1093/bioinformatics/bts359 ◽

2012 ◽

Vol 28 (16) ◽

pp. 2207-2208 ◽

Cited By ~ 6

Author(s):

J. Otegui ◽

A. H. Arino

Keyword(s):

Biodiversity Data ◽

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

A Google Sheet Add-on for Biodiversity Data Standardization and Sharing

Biodiversity Information Science and Standards ◽

10.3897/biss.4.59228 ◽

2020 ◽

Vol 4 ◽

Author(s):

José Augusto Salim ◽

Antonio Saraiva

Keyword(s):

Information Retrieval ◽

Data Sharing ◽

Information Science ◽

Data Sets ◽

Biodiversity Informatics ◽

Biodiversity Data ◽

Data Standardization ◽

Darwin Core ◽

Rest Api ◽

Biodiversity Information

For those biologists and biodiversity data managers who are unfamiliar with information science data practices of data standardization, the use of complex software to assist in the creation of standardized datasets can be a barrier to sharing data. Since the ratification of the Darwin Core Standard (DwC) (Darwin Core Task Group 2009) by the Biodiversity Information Standards (TDWG) in 2009, many datasets have been published and shared through a variety of data portals. In the early stages of biodiversity data sharing, the protocol Distributed Generic Information Retrieval (DiGIR), progenitor of DwC, and later the protocols BioCASe and TDWG Access Protocol for Information Retrieval (TAPIR) (De Giovanni et al. 2010) were introduced for discovery, search and retrieval of distributed data, simplifying data exchange between information systems. Although these protocols are still in use, they are known to be inefficient for transferring large amounts of data (GBIF 2017). Because of that, in 2011 the Global Biodiversity Information Facility (GBIF) introduced the Darwin Core Archive (DwC-A), which allows more efficient data transfer, and has become the preferred format for publishing data in the GBIF network. DwC-A is a structured collection of text files, which makes use of the DwC terms to produce a single, self-contained dataset. Many tools for assisting data sharing using DwC-A have been introduced, such as the Integrated Publishing Toolkit (IPT) (Robertson et al. 2014), the Darwin Core Archive Assistant (GBIF 2010) and the Darwin Core Archive Validator. Despite promoting and facilitating data sharing, many users have difficulties using such tools, mainly because of the lack of training in information science in the biodiversity curriculum (Convention on Biological Diversiity 2012, Enke et al. 2012). However, most users are very familiar with spreadsheets to store and organize their data, but the adoption of the available solutions requires data transformation and training in information science and more specifically, biodiversity informatics. For an example of how spreadsheets can simplify data sharing see Stoev et al. (2016). In order to provide a more "familiar" approach to data sharing using DwC-A, we introduce a new tool as a Google Sheet Add-on. The Add-on, called Darwin Core Archive Assistant Add-on can be installed in the user's Google Account from the G Suite MarketPlace and used in conjunction with the Google Sheets application. The Add-on assists the mapping of spreadsheet columns/fields to DwC terms (Fig. 1), similar to IPT, but with the advantage that it does not require the user to export the spreadsheet and import it into another software. Additionally, the Add-on facilitates the creation of a star schema in accordance with DwC-A, by the definition of a "CORE_ID" (e.g. occurrenceID, eventID, taxonID) field between sheets of a document (Fig. 2). The Add-on also provides an Ecological Metadata Language (EML) (Jones et al. 2019) editor (Fig. 3) with minimal fields to be filled in (i.e., mandatory fields required by IPT), and helps users to generate and share DwC-Archives stored in the user's Google Drive, which can be downloaded as a DwC-A or automatically uploaded to another public storage resource like a user's Zenodo Account (Fig. 4). We expect that the Google Sheet Add-on introduced here, in conjunction with IPT, will promote biodiversity data sharing in a standardized format, as it requires minimal training and simplifies the process of data sharing from the user's perspective, mainly for those users not familiar with IPT, but that historically have worked with spreadsheets. Although the DwC-A generated by the add-on still needs to be published using IPT, it does provide a simpler interface (i.e., spreadsheet) for mapping data sets to DwC than IPT. Even though the IPT includes many more features than the Darwin Core Assistant Add-on, we expect that the Add-on can be a "starting point" for users unfamiliar with biodiversity informatics before they move on to more advanced data publishing tools. On the other hand, Zenodo integration allows users to share and cite their standardized data sets without publishing them via IPT, which can be useful for users without access to an IPT installation. Additionally, we are working on new features and future releases will include the automatic generation of Global Unique Identifiers for shared records, the possibility of adding additional data standards and DwC extensions, integration with GBIF REST API and with IPT REST API.

Download Full-text