scholarly journals Geocoding genomic databases using GBIF

2018 ◽  
Author(s):  
Roderic D. M. Page

AbstractMany nucleotide sequences in the publicly available genomics databases lack spatial information, such as the latitude and longitude coordinates for the locality where the sample for sequencing was taken. In this note I discuss several approaches to geocoding sequence records. The first method uses the Global Biodiversity Information Facility (GBIF: https://gbif.org) as a gazetter. The availability of a simple full text search across GBIF data makes it possible to rapidly geocode locality information simply by searching for matching records within GBIF. Hence if a sequence lacks coordinates but has some locality information it could be rapidly geocoded. The second method matches voucher specimen code for sequences with the corresponding specimen records in GBIF, which may be geocoded even if the sequence obtained from that specimen is not. Lastly, there will be cases where sequence records lack either locality or specimen information, but that information is available elsewhere, such as in the published literature or in supplementary data files. The possibility of publishing geocoded sequence records using Github is discussed.

2020 ◽  
Vol 8 ◽  
Author(s):  
Sonia Ferreira ◽  
Rui Andrade ◽  
Ana Gonçalves ◽  
Pedro Sousa ◽  
Joana Paupério ◽  
...  

The InBIO Barcoding Initiative (IBI) Diptera 01 dataset contains records of 203 specimens of Diptera. All specimens have been morphologically identified to species level, and belong to 154 species in total. The species represented in this dataset correspond to about 10% of continental Portugal dipteran species diversity. All specimens were collected north of the Tagus river in Portugal. Sampling took place from 2014 to 2018, and specimens are deposited in the IBI collection at CIBIO, Research Center in Biodiversity and Genetic Resources. This dataset contributes to the knowledge on the DNA barcodes and distribution of 154 species of Diptera from Portugal and is the first of the planned IBI database public releases, which will make available genetic and distribution data for a series of taxa. All specimens have their DNA barcodes made publicly available in the Barcode of Life Data System (BOLD) online database and the distribution dataset can be freely accessed through the Global Biodiversity Information Facility (GBIF).


Author(s):  
Amy Davis ◽  
Tim Adriaens ◽  
Rozemien De Troch ◽  
Peter Desmet ◽  
Quentin Groom ◽  
...  

To support invasive alien species risk assessments, the Tracking Invasive Alien Species (TrIAS) project has developed an automated, open, workflow incorporating state-of-the-art species distribution modelling practices to create risk maps using the open source language R. It is based on Global Biodiversity Information Facility (GBIF) data and openly published environmental data layers characterizing climate and land cover. Our workflow requires only a species name and generates an ensemble of machine-learning algorithms (Random Forest, Boosted Regression Trees, K-Nearest Neighbors and AdaBoost) stacked together as a meta-model to produce the final risk map at 1 km2 resolution (Fig. 1). Risk maps are generated automatically for standard Intergovernmental Panel on Climate Change (IPCC) greenhouse gas emission scenarios and are accompanied by maps illustrating the confidence of each individual prediction across space, thus enabling the intuitive visualization and understanding of how the confidence of the model varies across space and scenario (Fig. 2). The effects of sampling bias are accounted for by providing options to: use the sampling effort of the higher taxon the modelled species belongs to (e.g., vascular plants), and to thin species occurrences. use the sampling effort of the higher taxon the modelled species belongs to (e.g., vascular plants), and to thin species occurrences. The risk maps generated by our workflow are defensible and repeatable and provide forecasts of alien species distributions under further climate change scenarios. They can be used to support risk assessments and guide surveillance efforts on alien species in Europe. The detailied modeling framework and code are available on GitHub: https://github.com/trias-project.


Author(s):  
Martin R. Kalfatovic ◽  
Constance Rinaldo

Data contained in the the Biodiversity Heritage Library (BHL) describes collections held in the world's major museums. Finding those collections data, however, remains a challenge. A literal needle in a Festuca stack as some have noted. BHL is actively engaging in incorporating tools (including Digital Object Identifier's (DOI's)and the recently launched full-text search) to make finding and linking to collection specimen information better. Still, it is not easy to find specific collections information in the non-semantically tagged BHL content. This session will call for ideas on how to locate this content.. BHL is an international consortium, making research literature openly available to the world as part of a global biodiversity community. The BHL was created in 2006 as a direct response to the needs of the taxonomic community for access to early literature. The original BHL organizational model, based on United States and United Kingdom partners, provided a template for what is now over 80 global partners. Through this extensive network of Members, Affiliates, and partners, over 56 million pages of biodiversity literature are available through the BHL portal. BHL changes the lives of researchers and assists the work of collections managers. By enhancing daily research at the Smithsonian and Harvard, BHL provides a global network of researchers with an easy-to-use digital library of content and services.


2013 ◽  
Vol 64 (2) ◽  
Author(s):  
Shakina Mohd Talkah ◽  
Iylia Zulkiflee ◽  
Mohd Shahir Shamsir

Currently, all the information regarding ethnobotanical, phytochemical and pharmaceutical information of South East Asia are scattered over many different publications, depositories and databases using various digital and analogue formats. Although there are taxonomic databases of medicinal plants, they are not linked to phytochemical and pharmaceutical information which are often resides in scientific literature. We present Phyknome; an ethnobotanical and phytochemical database with more than 22,000 species of ethnoflora of Asia. The creation of this database will enable a biotechnology researcher to seek and identify ethnobotanical information based on a species’ scientific name, description and phytochemical information. It is constructed using a digitization pipeline that allow high throughput digitization of archival data, an automated dataminer to mine for pharmaceutical compounds information and an online database to integrated these information. The main functions include an automated taxonomy, bibliography and API interface with primary databases such as Global Biodiversity Information Facility (GBIF). We believe that Phyknome will contribute to the digital knowledge ecosystem to elevate access and provide tools for ethnobotanical research and contributes to the management, assessment and stewardship of biodiversity. The database is available at http://mapping.fbb.utm.my/phyknome/.


Sign in / Sign up

Export Citation Format

Share Document