Mission: Implausible — Revealing rogue marine species in records across biodiversity data platforms

Biodiversity Information Science and Standards ◽

10.3897/biss.3.36002 ◽

2019 ◽

Vol 3 ◽

Author(s):

Claude Nozères ◽

Mary Kennedy

Keyword(s):

Rare Species ◽

Marine Species ◽

Original Data ◽

Data Systems ◽

Global Biodiversity Information Facility ◽

Software Packages ◽

Ongoing Work ◽

Data Source ◽

Types Of Information ◽

Specimen Record

Online biodiversity platforms publish datasets with graphic tools to help with quality control of submitted records, but more could be done to make the data robust for ecological analyses. Attention has focused mostly on automating tools for obvious errors, including misspelled names and synonyms, dates, or coordinates. However, a manual review of species identifications and distributions may uncover improbable records, such as a species reported in an area far from its usual range, or a rare species found in an area that has many more records of a related species. Examples are shown by constructing checklists in the Northwest Atlantic, using information from the World Register of Marine Species (WoRMS, http://www.marinespecies.org) and the Ocean Biogeographic Information System (OBIS, https://obis.org). Reviewing rare species records revealed some misidentifications, but in other instances, the rare species was valid while it was the commonly reported species that needed correction. Confirmations were obtained by comparing records from different regions, but also across platforms, including photos from observers on iNaturalist Canada (https://inaturalist.ca), genetic analyses on Barcode of Life Data systems (BOLD, http://www.boldsystems.org), and literature in the Biodiversity Heritage Library (BHL, https://www.biodiversitylibrary.org). While this exercise succeeded in validating the marine taxa of a region, it is an obvious candidate for automation in three areas: 1) flagging records of improbable taxa in a region, 2) comparing records with different types of information (e.g., specimen photos, genetic groupings, or literature records), and 3) updating users and providers when records get flagged as unusual or are modified. The first approach could be explored using online graphics tools or R software packages (rOpenSci, https://ropensci.org). The second toolset, comparing records across platforms, is partially realized with some linkages already operating between WoRMS, OBIS, BOLD, BHL, iNaturalist, and the Global Biodiversity Information Facility (GBIF, https://www.gbif.org). The third target will be the most difficult to implement, requiring reliable platform cross-linkages and specimen record identifiers to send notifications of changed status of records to both users and the original data source. Ongoing work is discussed on communicating the need to review records across platforms, with the hope that toolsets will be developed to make this task easier.

Download Full-text

A Recommender for Choosing Data Systems based on Application Profiling and Benchmarking

10.5753/sbbd.2021.17883 ◽

2021 ◽

Author(s):

Elton Figueiredo de Souza Soares ◽

Renan Souza ◽

Raphael Melo Thiago ◽

Marcelo de Oliveira Costa Machado ◽

Leonardo Guerreiro Azevedo

Keyword(s):

Empirical Evidence ◽

Informed Decision ◽

Data System ◽

Data Driven ◽

Data Systems ◽

Ongoing Work ◽

Wide Range

In our data-driven society, there are hundreds of possible data systems in the market with a wide range of configuration parameters, making it very hard for enterprises and users to choose the most suitable data systems. There is a lack of representative empirical evidence to help users make an informed decision. Using benchmark results is a widely adopted practice, but like there are several data systems, there are various benchmarks. This ongoing work presents an architecture and methods of a system that supports the recommendation of the most suitable data system for an application. We also illustrates how the recommendation would work in a fictitious scenario.

Download Full-text

Recreational Use, Valuation, and Management, of Killer Whales (Orcinus orca) on Canada's Pacific Coast

Environmental Conservation ◽

10.1017/s0376892900037656 ◽

1993 ◽

Vol 20 (2) ◽

pp. 149-156 ◽

Cited By ~ 49

Author(s):

David A. Duffus ◽

Philip Dearden

Keyword(s):

Resource Management ◽

Marine Species ◽

Orcinus Orca ◽

Killer Whales ◽

Public Attention ◽

Management Options ◽

Current Resource ◽

Types Of Information ◽

Biological Uncertainty ◽

Recreational Use

The management of many ocean wildlife species is left in an institutional void, yet certain species command considerable public attention and have burgeoning management problems. In this paper the non-consumptive recreational use of Killer Whales (Orcinus orca) on Canada's Pacific Ocean coast is used as an example of management difficulties that are associated with oceanic species. Problems associated with jurisdiction and institutional arrangements are coupled to significant levels of biological uncertainty and restricted management options, as well as to management concerns associated with the human domain. The case is conceptualized as an interaction between the human and more general ecological spheres, mediated by the history of the relationship between humans and the species in question. Two routes to regulation are presented, dealing respectively with the human and ecological aspects. Of particular significance is the idea that both types of information are necessary to maximize utility to both the human user and the Whales.Results from an ongoing study of recreational use are presented to indicate some of the variables that have emerged. These are to be interpreted within current resource management infrastructure to create a tenuous situation. The unfortunate logic that results from this study is that if Killer Whales (a high-profile species) in Canada (a well-endowed nation) have not warranted more substantial protection, then the outlook for less well-known marine species in areas of the world where resource management priorities involve more direct survival concerns, is not optimistic.

Download Full-text

Construction of Hangzhou Silk Cognitive Evaluation System Based on the Grounded Theory

Asian Social Science ◽

10.5539/ass.v14n7p92 ◽

2018 ◽

Vol 14 (7) ◽

pp. 92

Author(s):

Aijuan Cao

Keyword(s):

Grounded Theory ◽

Evaluation System ◽

Original Data ◽

Analysis Software ◽

Logical Relationship ◽

Systematic Analysis ◽

Grounded Theory Research ◽

Cognitive Evaluation ◽

Data Source ◽

Theory Research

The article adopts the grounded theory research method, where it takes “how people perceive and evaluate Hangzhou Silk” as the research subject, and uses data that was obtained through interviews and investigations as data source. With the systematic analysis of the original data, this paper gradually extracts and summarizes the content dimension and evaluation results of consumers' cognition evaluation on Hangzhou silk using the quantitative analysis software NVivo 11.0. Finally, based on the eight dimensions of the above research, this study combs and analyzes the logical relationship between them, and constructs the cognition evaluation system of Hangzhou silk. The research conclusion enriches and expands the research scope in the field of silk cognition research.

Download Full-text

The OXL format for the exchange of integrated datasets

Journal of Integrative Bioinformatics ◽

10.1515/jib-2007-62 ◽

2007 ◽

Vol 4 (3) ◽

pp. 27-40 ◽

Cited By ~ 5

Author(s):

Jan Taubert ◽

Klaus Peter Sieren ◽

Matthew Hindle ◽

Berend Hoekman ◽

Rainer Winnenburg ◽

...

Keyword(s):

Life Science ◽

Original Data ◽

Biological Information ◽

Data Sets ◽

Complex Data ◽

Scientific Publications ◽

Mining System ◽

Biological Domain ◽

Data Source ◽

Text Mining System

Abstract A prerequisite for systems biology is the integration and analysis of heterogeneous experimental data stored in hundreds of life-science databases and millions of scientific publications. Several standardised formats for the exchange of specific kinds of biological information exist. Such exchange languages facilitate the integration process; however they are not designed to transport integrated datasets. A format for exchanging integrated datasets needs to i) cover data from a broad range of application domains, ii) be flexible and extensible to combine many different complex data structures, iii) include metadata and semantic definitions, iv) include inferred information, v) identify the original data source for integrated entities and vi) transport large integrated datasets. Unfortunately, none of the exchange formats from the biological domain (e.g. BioPAX, MAGE-ML, PSI-MI, SBML) or the generic approaches (RDF, OWL) fulfil these requirements in a systematic way.We present OXL, a format for the exchange of integrated data sets, and detail how the aforementioned requirements are met within the OXL format. OXL is the native format within the data integration and text mining system ONDEX. Although OXL was developed with the ONDEX system in mind, it also has the potential to be used in several other biological and non-biological applications described in this paper.Availability: The OXL format is an integral part of the ONDEX system which is freely available under the GPL at http://ondex.sourceforge.net/. Sample files can be found at http://prdownloads.sourceforge.net/ondex/ and the XML Schema at http://ondex.svn.sf.net/viewvc/*checkout*/ondex/trunk/backend/data/xml/ondex.xsd.

Download Full-text

Research and Implementation of Intrusion Detection Based Genetic Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.631-632.946 ◽

2014 ◽

Vol 631-632 ◽

pp. 946-951 ◽

Cited By ~ 1

Author(s):

Guang Cai Cui ◽

Bai Tong Liu

Keyword(s):

Genetic Algorithm ◽

Intrusion Detection ◽

Real Time ◽

Original Data ◽

Network Data ◽

Initial Population ◽

Classification Rules ◽

Detection Technology ◽

Network Intrusions ◽

Data Source

For traditional intrusion detection technology, the lack of intelligent and self-adaptive has become increasingly prominent when they cope with unknown attacks. A method based on genetic algorithm was presented for discovering and learning the intrusion detection rules. This algorithm uses the network data packet as an original data source, after pretreatment, initialized them to be the initial population of the genetic algorithm, then derive the classification rules. These rules were used to detect or classify network intrusions in a real-time network environment, selecting the intrusion packets. The experiment proves the efficiency of the presented method.

Download Full-text

A taxonomically harmonized and temporally standardized fossil pollen dataset from Siberia covering the last 40 kyr

Earth System Science Data ◽

10.5194/essd-12-119-2020 ◽

2020 ◽

Vol 12 (1) ◽

pp. 119-135

Author(s):

Xianyong Cao ◽

Fang Tian ◽

Andrei Andreev ◽

Patricia M. Anderson ◽

Anatoly V. Lozhkin ◽

...

Keyword(s):

Original Data ◽

Data Mapping ◽

Pollen Data ◽

Last Glacial Period ◽

Pollen Counts ◽

Data Source ◽

Palynological Records ◽

Percentage Data ◽

Pollen Records ◽

The Last Glacial

Abstract. Pollen records from Siberia are mostly absent in global or Northern Hemisphere synthesis works. Here we present a taxonomically harmonized and temporally standardized pollen dataset that was synthesized using 173 palynological records from Siberia and adjacent areas (northeastern Asia, 42–75∘ N, 50–180∘ E). Pollen data were taxonomically harmonized, i.e. the original 437 taxa were assigned to 106 combined pollen taxa. Age–depth models for all records were revised by applying a constant Bayesian age–depth modelling routine. The pollen dataset is available as count data and percentage data in a table format (taxa vs. samples), with age information for each sample. The dataset has relatively few sites covering the last glacial period between 40 and 11.5 ka (calibrated thousands of years before 1950 CE) particularly from the central and western part of the study area. In the Holocene period, the dataset has many sites from most of the area, with the exception of the central part of Siberia. Of the 173 pollen records, 81 % of pollen counts were downloaded from open databases (GPD, EPD, PANGAEA) and 10 % were contributions by the original data gatherers, while a few were digitized from publications. Most of the pollen records originate from peatlands (48 %) and lake sediments (33 %). Most of the records (83 %) have ≥3 dates, allowing the establishment of reliable chronologies. The dataset can be used for various purposes, including pollen data mapping (example maps for Larix at selected time slices are shown) as well as quantitative climate and vegetation reconstructions. The datasets for pollen counts and pollen percentages are available at https://doi.org/10.1594/PANGAEA.898616 (Cao et al., 2019a), also including the site information, data source, original publication, dating data, and the plant functional type for each pollen taxa.

Download Full-text

Hybrid Query Execution on Linked Data With Complete Results

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2021010102 ◽

2021 ◽

Vol 17 (1) ◽

pp. 25-49

Author(s):

Samita Bai ◽

Shakeel A. Khoja

Keyword(s):

Linked Data ◽

Real Data ◽

Original Data ◽

Query Execution ◽

Data Set ◽

Source Selection ◽

Start Up ◽

Query Logs ◽

Data Source ◽

Query Patterns

The link traversal strategies to query Linked Data over WWW can retrieve up-to-date results using a recursive URI lookup process in real-time. The downside of this approach comes with the query patterns having subject unbound (i.e. ?S rdf:type:Class). Such queries fail to start up the traversal process as the RDF pages are subject-centric in nature. Thus, zero-knowledge link traversal leads to the empty query results for these queries. In this paper, the authors analyze a large corpus of real-world SPARQL query logs and identify the Most Frequent Predicates (MFPs) occurring in these queries. The knowledge of these MFPs helps in finding and indexing a limited number of triples from the original data set. Additionally, the authors propose a Hybrid Query Execution (HQE) approach to execute the queries over this index for initial data source selection followed by link traversal process to fetch complete results. The evaluation of HQE on the latest real data benchmarks reveals that it retrieves at least five times more results than the existing approaches.

Download Full-text

Birding trip reports as a data source for monitoring rare species

Animal Conservation ◽

10.1111/acv.12258 ◽

2016 ◽

Vol 19 (5) ◽

pp. 430-435 ◽

Cited By ~ 3

Author(s):

C. Camacho

Keyword(s):

Rare Species ◽

Data Source

Download Full-text

Geological Modeling System Based on Flash3D Technology

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.256-259.2285 ◽

2012 ◽

Vol 256-259 ◽

pp. 2285-2292

Author(s):

Mu Huang ◽

Xiao Li Rong

Keyword(s):

Computer Technology ◽

Hot Spot ◽

Original Data ◽

Geological Modeling ◽

Borehole Data ◽

3D Geological Modeling ◽

Modeling Software ◽

Data Source ◽

Modeling Data ◽

Geological Research

3D geological modeling is an inter-discipline subject applying computer technology to the geological research. With the development of Internet, 3D geological modeling based on Web has become a hot spot. In this article, we applied the Flash3D technology to the 3D geological modeling based on Web. We designed a process which use borehole data as original data source, automatic interpolation and identify strata from it, then calculate modeling data. We developed a 3D geological and geographical system based on Flex and Java platform using Flash3D engine technology. The system is a true 3D geological modeling software in B/S mode.

Download Full-text

Using the Taxonomic Backbone(s): The challenge of selecting a taxonomic resource and integrating it with a collection management solution

Biodiversity Information Science and Standards ◽

10.3897/biss.5.74115 ◽

2021 ◽

Vol 5 ◽

Author(s):

Teresa Mayfield-Meyer ◽

Phyllis Sharp ◽

Dusty McDonald

Keyword(s):

Marine Invertebrate ◽

Marine Species ◽

Collection Management ◽

Global Biodiversity Information Facility ◽

The World ◽

Name Matching ◽

Research Grade ◽

Global Biodiversity ◽

Biodiversity Information ◽

Insight Into

The reality is that there is no single “taxonomic backbone”, there are many: the Global Biodiversity Information Facility (GBIF) Backbone Taxonomy, the World Register of Marine Species (WoRMS) and MolluscaBase, to name a few. We could view each one of these as a vertebra on the taxonomic backbone, but even that isn’t quite correct as some of these are nested within others (MolluscaBase contributes to WoRMS, which contributes to Catalogue of Life, which contributes to the GBIF Backbone Taxonomy). How is a collection manager without expertise in a given set of taxa and a limited amount of time devoted to finding the “most current” taxonomy supposed to maintain research grade identifications when there are so many seemingly authoritative taxonomic resources? And once a resource is chosen, how can they seamlessly use the information in that resource? This presentation will document how the Arctos community’s use of the taxon name matching service Global Names Architecture (GNA) led one volunteer team leader in a marine invertebrate collection to attempt to make use of WoRMS taxonomy and how her persistence brought better identifications and classifications to a community of collections. It will also provide insight into some of the technical and curatorial challenges involved in using an outside resource as well as the ongoing struggle to keep up with changes as they occur in the curated resource.

Download Full-text