Mainstreaming Molecular Biodiversity: A call for a unified and interoperable framework

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37338 ◽

2019 ◽

Vol 3 ◽

Author(s):

Pier Luigi Buttigieg ◽

Jerry Lanfear ◽

Frank Oliver Glöckner ◽

James Macklin

Keyword(s):

Amplicon Sequencing ◽

Molecular Data ◽

Biological Data ◽

Molecular Methods ◽

Global Ocean ◽

Marker Genes ◽

Global Biodiversity Information Facility ◽

Data Resource ◽

Molecular Biodiversity ◽

Biodiversity Information

Over the past 20 years, immense progress has been made in enhancing the effectiveness, affordability, and deployability of molecular methods for biodiversity assessment and monitoring. From the micro- to macroscopic scale, methods such as amplicon sequencing of phylogenetic marker genes, metagenomics, and metatranscriptomics have greatly impacted biology and ecology, and are steadily being integrated into national and international biodiversity policy. Over the next decade, technologies such as miniaturised and autonomous DNA sequencing platforms will amplify this momentum, ushering in an unprecedented volume of deeply minable biodiversity information. While production-grade resources exist to standardise, archive, and exchange raw molecular data (e.g. the resources of the International Nucleotide Sequence Database Collaboration (INSDC) for DNA and RNA sequences), there are still no equivalent frameworks for biodiversity information derived from molecular methods. Research infrastructures in both the biodiversity and molecular biology domains must fill this gap with great urgency to channel molecular advances into efforts to understand and sustain Earth's imperilled biosphere. This session seeks to accelerate the implementation of global standards to link molecular biodiversity data to taxonomy-based systems. Only with these in place can we realise a robust, distributed, yet fully interoperating, network of infrastructures, projects, and researchers addressing molecular biodiversity. This introductory series of flash talks will present the rationale and goals of the session, alongside a joint vision from representatives of several convening stakeholders. A contribution from ELIXIR, an intergovernmental organisation of distributed infrastructures for biological data, will demonstrate the high readiness of biological data resources such as the European Nucleotide Archive (ENA) to mobilise molecular data along new standards. An intervention from the SILVA rRNA database project - itself an ELIXIR Core Data Resource - will note the actionability of interfacing molecular-based phylogenies with Linnaean systems hosted by partners such as the Global Biodiversity Information Facility (GBIF). Two more contributions will emphasise the essential role (and thus critical need) of molecular biodiversity standards in bridging research and operations. The first will focus on the nation-scale Metagenomics-Based Ecosystem Biomonitoring (EcoBiomics) project in Canada, which is using 'omic approaches to better assess, monitor, and remediate microbial and invertebrate biodiversity in soil and aquatic ecosystems, thus sustaining ecosystem resilience and service provision upon which society and economies depend. The second will underscore the need for international and stable standards to advance the long-term mission of the Global Omics Observatory Network (GLOMICON), and its contribution to the Global Ocean Observing System's Essential Ocean Variables (GOOS EOVs) under the Intergovernmental Oceanographic Commission of the United Nations Educational, Scientific, and Cultural Organization (IOC-UNESCO). Collectively, these contributions will make the case for a concerted effort to expedite the principled creation of operational information standards in molecular biodiversity. We invite all stakeholders to join us in implementing these standards in the coming years.

Download Full-text

Towards a Post-Graduate Level Curriculum for Biodiversity Informatics. Perspectives from the Global Biodiversity Information Facility (GBIF) Community

Biodiversity Data Journal ◽

10.3897/bdj.9.e68010 ◽

2021 ◽

Vol 9 ◽

Author(s):

Fatima Parker-Allie ◽

Francisco Pando ◽

Anders Telenius ◽

Jean Ganglo ◽

Danny Vélez ◽

...

Keyword(s):

Biological Data ◽

Initial Assessment ◽

Biodiversity Informatics ◽

Global Biodiversity Information Facility ◽

Policy Makers ◽

Academic Teaching ◽

E Learning ◽

Learning Platforms ◽

Global Biodiversity ◽

Biodiversity Information

Biodiversity informatics is a new and evolving field, requiring efforts to develop capacity and a curriculum for this field of science. The main objective was to summarise the level of activity and the efforts towards developing biodiversity informatics curricula, for work-based training and/or academic teaching at universities, taking place within the Global Biodiversity Information Facility (GBIF) countries and its associated network. A survey approach was used to identify existing capacities and resources within the network. Most of GBIF Nodes survey respondents (80%) are engaged in onsite training activities, with a focus on work-based professionals, mostly researchers, policy-makers and students. Training topics include data mobilisation, digitisation, management, publishing, analysis and use, to enable the accessibility of analogue and digital biological data that currently reside as scattered datasets. An initial assessment of academic teaching activities highlighted that countries in most regions, to varying degrees, were already engaged in the conceptualisation, development and/or implementation of formal academic programmes in biodiversity informatics, including programmes in Benin, Colombia, Costa Rica, Finland, France, India, Norway, South Africa, Sweden, Taiwan and Togo. Digital e-learning platforms were an important tool to help build capacity in many countries. In terms of the potential in the Nodes network, 60% expressed willingness to be recruited or commissioned for capacity enhancement purposes. Contributions and activities of various country nodes across the network have been highlighted and a working curriculum framework has been defined.

Download Full-text

ITIS and the Global Taxonomic Backbone

Biodiversity Information Science and Standards ◽

10.3897/biss.5.75471 ◽

2021 ◽

Vol 5 ◽

Author(s):

David Mitchell ◽

Thomas Orrell

Keyword(s):

Scientific Data ◽

Biological Data ◽

Global Biodiversity Information Facility ◽

Global Database ◽

The Public ◽

Taxonomic Information ◽

The World ◽

Biological Data Management ◽

Biodiversity Information ◽

Scientific Name

The Integrated Taxonomic Information System (ITIS) provides a regularly updated, global database that currently contains over 868,000 scientific names and their hierarchy. The program exists to communicate a comprehensive taxonomy of global species across 7 kingdoms that enables biodiversity information to be discovered, indexed, and connected across all human endeavors. ITIS partners with taxonomists and experts across the world to assemble scientific names and their taxonomic relationships, and then distributes that data through publicly available software. A single taxon may be represented by multiple scientific names, so ITIS makes it a priority to provide synonymy. Linking valid or accepted names with their subjective and objective synonyms is a key component of name translation and increases the precision of searches and organization of information. ITIS and its partner Species2000 create the Catalogue of Life (CoL) checklist that provides quality scientific name data for over 2.2M species. The CoL is the Global Biodiversity Information Facility (GBIF) taxonomic backbone. Providing automated open access to complete, current, literature-referenced, and expert-validated taxonomic information enables biological data management systems, and is elemental to enhancing the utility of the amassed scientific data across the world. Fully leveraging this information for the public good is crucial for empowering the global digital society to confront the most pressing social and environmental challenges.

Download Full-text

Towards Linked Open Molecular Data: Recommendations for researchers, collections, infrastructures and publishers

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37204 ◽

2019 ◽

Vol 3 ◽

Author(s):

Gabriele Droege ◽

Ilene Karsch-Mizrachi ◽

Katharine Barker ◽

Jonathan Coddington ◽

Ole Seberg

Keyword(s):

Natural History ◽

Dna Sequencing ◽

Biological Material ◽

Sequence Data ◽

Open Data ◽

Molecular Data ◽

Biological Data ◽

Global Biodiversity Information Facility ◽

Natural History Collections ◽

Long Run

The variety of molecular methods used to analyze biosamples is continuously increasing, as is the need for the standardized deposition, documentation and citation of both the samples as well as the methods applied to them. Global initiatives such as the International Nucleotide Sequence Database Collaboration (INSDC, http://www.insdc.org), Barcode of Life Data System (BOLD, http://www.boldsystems.org), the Global Biodiversity Information Facility (GBIF, http://www.gbif.org) and the Global Genome Biodiversity Network (GGBN, http://www.ggbn.org), in addition to many others, have been working towards standardized access to biological data for many years. Collectively, these biodiversity data management platforms provide a considerable and indispensable infrastructure to the research community. However, cross-linking the massive amounts of protein and DNA sequence data submitted to these databases every year with standardized records of the underlying biological material remains challenging. Best practices for standardized data submissions and data citations are urgently needed. In the long run, two goals should be achieved above all else: all sequence data should be linked to natural history collections, and biological material that was used for molecular research, especially DNA sequencing, should be deposited and, thus, made accessible in public, well curated collections. all sequence data should be linked to natural history collections, and biological material that was used for molecular research, especially DNA sequencing, should be deposited and, thus, made accessible in public, well curated collections. Here we will provide recommendations both for researchers and collections how to cite underlying biological material at INSDC and in publications in a standardized way towards Linked Open Data. We will also address how the global infrastructures and publishers can improve their interoperability.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

The Global Biodiversity Information Facility (GBIF)

Systematics Association Special Volumes - Biodiversity Databases ◽

10.1201/9781439832547.ch1 ◽

2007 ◽

pp. 1-4 ◽

Cited By ~ 5

Author(s):

Meredith Lane ◽

James Edwards

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

International Infrastructure For Enabling The New Taxonomy The Role Of The Global Biodiversity Information Facility (gbif)

The New Taxonomy - Systematics Association Special Volumes ◽

10.1201/9781420008562.ch6 ◽

2008 ◽

pp. 87-94

Author(s):

James Edwards ◽

Larry Speers

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

International Infrastructure for Enabling the New Taxonomy: The Role of the Global Biodiversity Information Facility (GBIF)

The New Taxonomy ◽

10.1201/9781420008562-10 ◽

2008 ◽

pp. 99-106

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text

Furthering Genomic Research Infrastructures: The Global Genome Biodiversity Network

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37155 ◽

2019 ◽

Vol 3 ◽

Author(s):

Katharine Barker ◽

Jonas Astrin ◽

Gabriele Droege ◽

Jonathan Coddington ◽

Ole Seberg

Keyword(s):

Natural History ◽

Best Practices ◽

Genomic Research ◽

Benefit Sharing ◽

Global Biodiversity Information Facility ◽

Access And Benefit Sharing ◽

Culture Collections ◽

Data Standard ◽

Research Infrastructures ◽

Biodiversity Information

Most successful research programs depend on easily accessible and standardized research infrastructures. Until recently, access to tissue or DNA samples with standardized metadata and of a sufficiently high quality, has been a major bottleneck for genomic research. The Global Geonome Biodiversity Network (GGBN) fills this critical gap by offering standardized, legal access to samples. Presently, GGBN’s core activity is enabling access to searchable DNA and tissue collections across natural history museums and botanic gardens. Activities are gradually being expanded to encompass all kinds of biodiversity biobanks such as culture collections, zoological gardens, aquaria, arboreta, and environmental biobanks. Broadly speaking, these collections all provide long-term storage and standardized public access to samples useful for molecular research. GGBN facilitates sample search and discovery for its distributed member collections through a single entry point. It stores standardized information on mostly geo-referenced, vouchered samples, their physical location, availability, quality, and the necessary legal information on over 50,000 species of Earth’s biodiversity, from unicellular to multicellular organisms. The GGBN Data Portal and the GGBN Data Standard are complementary to existing infrastructures such as the Global Biodiversity Information Facility (GBIF) and International Nucleotide Sequence Database (INSDC). Today, many well-known open-source collection management databases such as Arctos, Specify, and Symbiota, are implementing the GGBN data standard. GGBN continues to increase its collections strategically, based on the needs of the research community, adding over 1.3 million online records in 2018 alone, and today two million sample data are available through GGBN. Together with Consortium of European Taxonomic Facilities (CETAF), Society for the Preservation of Natural History Collections (SPNHC), Biodiversity Information Standards (TDWG), and Synthesis of Systematic Resources (SYNTHESYS+), GGBN provides best practices for biorepositories on meeting the requirements of the Nagoya Protocol on Access and Benefit Sharing (ABS). By collaboration with the Biodiversity Heritage Library (BHL), GGBN is exploring options for tagging publications that reference GGBN collections and associated specimens, made searchable through GGBN’s document library. Through its collaborative efforts, standards, and best practices GGBN aims at facilitating trust and transparency in the use of genetic resources.

Download Full-text

Data framework for efficient management of sequence and microsatellite data in biodiversity studies

Archives Animal Breeding ◽

10.7482/0003-9438-56-006 ◽

2013 ◽

Vol 56 (1) ◽

pp. 50-64 ◽

Cited By ~ 1

Author(s):

C. V. C. Truong ◽

Z. Duchev ◽

E. Groeneveld

Keyword(s):

Information System ◽

Molecular Data ◽

General Information ◽

Biological Data ◽

Efficient Management ◽

Data Framework ◽

Wide Range ◽

Uniform Solution ◽

General Data ◽

General Data Model

Abstract. In recent years, software packages for the management of biological data have rapidly been developing. However, currently, there is no general information system available for managing molecular data derived from both Sanger sequencing and microsatellite genotyping projects. A prerequisite to implementing such a system is to design a general data model which can be deployed to a wide range of labs without modification or customization. Thus, this paper aims to (1) suggest a uniform solution to efficiently store data items required in different labs, (2) describe procedures for representing data streams and data items (3) and construct a formalized data framework. As a result, the data framework has been used to develop an integrated information system for small labs conducting biodiversity studies.

Download Full-text

The Global Biodiversity Information Facility (GBIF) and the Japan’s activities

Journal of Information Processing and Management ◽

10.1241/johokanri.46.389 ◽

2003 ◽

Vol 46 (6) ◽

pp. 389-393

Author(s):

Shun’ichi KIKUCHI

Keyword(s):

Global Biodiversity Information Facility ◽

Global Biodiversity ◽

Biodiversity Information

Download Full-text