Introducing bdclean: a user friendly biodiversity data cleaning pipeline

Introducing ‘The bdverse’: a family of R packages for biodiversity data

Biodiversity Information Science and Standards ◽

10.3897/biss.3.37643 ◽

2019 ◽

Vol 3 ◽

Author(s):

Tomer Gueta ◽

Vijay Barve ◽

Thiloshon Nagarajah ◽

Povilas Gibas ◽

Yohay Carmel

Keyword(s):

Data Cleaning ◽

Biodiversity Data ◽

User Input ◽

Biodiversity Science ◽

Darwin Core ◽

Shiny App ◽

Cleaning Procedures ◽

Core Requirements ◽

User Friendly ◽

Hierarchal Structure

The bdverse is a collection of packages that form a general framework for facilitating biodiversity science in R. We build it to serve as a sustainable and agile infrastructure that enhances the value of biodiversity data by allowing users to conveniently employ R, for data exploration, quality assessment, data cleaning, and standardization. The bdverse supports users with and without programming capabilities. It includes six unique packages in a hierarchal structure — representing different functionality levels (Fig. 1). Major features of three core packages will be highlighted and demonstrated: (i) bdDwC provides an interactive Shiny app and a set of functions for standardizing field names in compliance with Darwin Core (DwC) format; (ii) bdchecks is an infrastructure for performing, filtering and managing various biodiversity data checks; (iii) bdclean is a user-friendly data cleaning Shiny app for the inexperienced R user. It provides features to manage complete workflow for biodiversity data cleaning, including data upload; user input - in order to adjust cleaning procedures; data cleaning; and finally, generation of various reports and versions of the data. We are now working on submitting the bdverse packages to rOpenSci software review, and as soon as the packages meet core requirements, we will officially release the bdverse. The bdverse project won the 2nd prize in the 2018 Ebbe Nielsen Challenge.

Download Full-text

speciesgeocodeR: An R package for linking species occurrences, user-defined regions and phylogenetic trees for biogeography, ecology and evolution

10.1101/032755 ◽

2015 ◽

Cited By ~ 6

Author(s):

Alexander Zizka ◽

Alexandre Antonelli

Keyword(s):

Data Quality ◽

Phylogenetic Trees ◽

Large Scale ◽

Data Cleaning ◽

R Package ◽

Species Occurrence ◽

Occurrence Data ◽

User Friendly ◽

Species Occurrences

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.

Download Full-text

rotl, an R package to interact with the Open Tree of Life data

10.7287/peerj.preprints.1471 ◽

2016 ◽

Cited By ~ 1

Author(s):

François Michonneau ◽

Joseph W. Brown ◽

David Winter

Keyword(s):

Data Structures ◽

R Package ◽

Tree Of Life ◽

Biodiversity Data ◽

Life Data ◽

Taxonomic Information ◽

Source Data ◽

R Packages ◽

The Rich ◽

Tools And Methods

1. While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. 2. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. 3. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies. Here, and in the vignettes accompanying the package, we demonstrate how rotl can be used with other R packages to analyze biodiversity data. 4. As phylogenies are being used in a growing number of applications, rotl facilitates access to phylogenetic data, and allows their integration with statistical methods and data sources available in R.

Download Full-text

rotl, an R package to interact with the Open Tree of Life data

10.7287/peerj.preprints.1471v3 ◽

2016 ◽

Cited By ~ 1

Author(s):

François Michonneau ◽

Joseph W. Brown ◽

David Winter

Keyword(s):

Data Structures ◽

R Package ◽

Tree Of Life ◽

Biodiversity Data ◽

Life Data ◽

Taxonomic Information ◽

Source Data ◽

R Packages ◽

The Rich ◽

Tools And Methods

1. While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. 2. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. 3. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies. Here, and in the vignettes accompanying the package, we demonstrate how rotl can be used with other R packages to analyze biodiversity data. 4. As phylogenies are being used in a growing number of applications, rotl facilitates access to phylogenetic data, and allows their integration with statistical methods and data sources available in R.

Download Full-text

webMCP-counter: a web interface for transcriptomics-based quantification of immune and stromal cells in heterogeneous human or murine samples

10.1101/2020.12.03.400754 ◽

2020 ◽

Author(s):

Maxime Meylan ◽

Etienne Becht ◽

Catherine Sautès-Fridman ◽

Aurélien de Reyniès ◽

Wolf H. Fridman ◽

...

Keyword(s):

Stromal Cells ◽

R Package ◽

Web Interface ◽

Transcriptomic Data ◽

R Programming Language ◽

Link Type ◽

Precise Estimation ◽

R Packages ◽

R Programming ◽

User Friendly

AbstractSummaryWe previously reported MCP-counter and mMCP-counter, methods that allow precise estimation of the immune and stromal composition of human and murine samples from bulk transcriptomic data, but they were only distributed as R packages. Here, we report webMCP-counter, a user-friendly web interface to allow all users to use these methods, regardless of their proficiency in the R programming language.Availability and ImplementationFreely available from http://134.157.229.105:3838/webMCP/. Website developed with the R package shiny. Source code available from GitHub: https://github.com/FPetitprez/webMCP-counter.

Download Full-text

A cloud-based toolbox for the versatile environmental annotation of biodiversity data

PLoS Biology ◽

10.1371/journal.pbio.3001460 ◽

2021 ◽

Vol 19 (11) ◽

pp. e3001460

Author(s):

Richard Li ◽

Ajay Ranipeta ◽

John Wilshire ◽

Jeremy Malczyk ◽

Michelle Duong ◽

...

Keyword(s):

R Package ◽

Scale Dependence ◽

Temporal Scale ◽

Environmental Data ◽

Task Management ◽

Biodiversity Data ◽

Web Based ◽

Continental Scale ◽

User Friendly ◽

Vast Range

A vast range of research applications in biodiversity sciences requires integrating primary species, genetic, or ecosystem data with other environmental data. This integration requires a consideration of the spatial and temporal scale appropriate for the data and processes in question. But a versatile and scale flexible environmental annotation of biodiversity data remains constrained by technical hurdles. Existing tools have streamlined the intersection of occurrence records with gridded environmental data but have remained limited in their ability to address a range of spatial and temporal grains, especially for large datasets. We present the Spatiotemporal Observation Annotation Tool (STOAT), a cloud-based toolbox for flexible biodiversity–environment annotations. STOAT is optimized for large biodiversity datasets and allows user-specified spatial and temporal resolution and buffering in support of environmental characterizations that account for the uncertainty and scale of data and of relevant processes. The tool offers these services for a growing set of near global, remotely sensed, or modeled environmental data, including Landsat, MODIS, EarthEnv, and CHELSA. STOAT includes a user-friendly, web-based dashboard that provides tools for annotation task management and result visualization, linked to Map of Life, and a dedicated R package (rstoat) for programmatic access. We demonstrate STOAT functionality with several examples that illustrate phenological variation and spatial and temporal scale dependence of environmental characteristics of birds at a continental scale. We expect STOAT to facilitate broader exploration and assessment of the scale dependence of observations and processes in ecology.

Download Full-text

BITE: an R package for biodiversity analyses

10.1101/181610 ◽

2017 ◽

Cited By ~ 11

Author(s):

Marco Milanesi ◽

Stefano Capomaccio ◽

Elia Vajana ◽

Lorenzo Bomba ◽

José Fernando Garcia ◽

...

Keyword(s):

Data Analysis ◽

R Package ◽

Molecular Data ◽

Third Party ◽

Biodiversity Data ◽

Representative Sampling ◽

Link Type ◽

Data Analyses ◽

Snp Data ◽

User Friendly

AbstractNowadays, molecular data analyses for biodiversity studies often require advanced bioinformatics skills, preventing many life scientists from analyzing their own data autonomously. BITE R package provides complete and user-friendly functions to handle SNP data and third-party software results (i.e. Admixture, TreeMix), facilitating their visualization, interpretation and use. Furthermore, BITE implements additional useful procedures, such as representative sampling and bootstrap for TreeMix, filling the gap in existing biodiversity data analysis tools.Availability:https://github.com/marcomilanesi/BITE

Download Full-text

ECX: An R Package for Studying Sensitivity of Antimicrobial Substances Using Spiral Plating Technology

Plant Health Progress ◽

10.1094/php-rs-16-0020 ◽

2016 ◽

Vol 17 (3) ◽

pp. 188-194 ◽

Cited By ~ 4

Author(s):

Gabriel Andrés Torres-Londoño ◽

Mary Hausbeck ◽

Jianjun Hao

Keyword(s):

R Package ◽

Large Datasets ◽

Graphical Interface ◽

Commercial Software ◽

Web Based ◽

Plating Technique ◽

Antimicrobial Substances ◽

Sensitivity Studies ◽

R Packages ◽

User Friendly

Spiral plating technique is reliable, repeatable, and more efficient than dilution plating methods in studying the efficacy of antimicrobial products. In this method, the concentration of chemicals can be varied at different positions on agar plates, but its calculation requires using a commercial software. To establish a user-friendly and cost-free platform, the R package ECX was developed to calculate chemical concentrations in spiral plating technique. Mathematical models were established for calculating dispensed volume on agar plates using variables (molecular weight and agar height) that affect diffusion. In addition to the R packages, the web-based Shiny extensions ECX, multi, and ppm were developed to provide a graphical interface for calculating individual concentrations, multiple concentrations, and stock concentrations, respectively. No significant differences were observed (P > 0.05) when ECX was compared with the commercial software. The ability to import and process large datasets makes the ECX package a better option for spiral plating technique studies. Furthermore, the multiplatform nature of the ECX package overcomes limitations presented in other software. Therefore, these ECX characteristics can increase the use of the spiral plating technique for sensitivity studies. Accepted for publication 21 June 2016.

Download Full-text

Rtpca: an R package for differential thermal proximity coaggregation analysis

Bioinformatics ◽

10.1093/bioinformatics/btaa682 ◽

2020 ◽

Cited By ~ 1

Author(s):

Nils Kurzawa ◽

André Mateus ◽

Mikhail M Savitski

Keyword(s):

Protein Interactions ◽

Predictive Performance ◽

R Package ◽

Supplementary Information ◽

Supplementary Data ◽

Protein Protein Interactions ◽

Proteome Profiling ◽

R Packages ◽

User Friendly

Abstract Summary Rtpca is an R package implementing methods for inferring protein–protein interactions (PPIs) based on thermal proteome profiling experiments of a single condition or in a differential setting via an approach called thermal proximity coaggregation. It offers user-friendly tools to explore datasets for their PPI predictive performance and easily integrates with available R packages. Availability and implementation Rtpca is available from Bioconductor (https://bioconductor.org/packages/Rtpca). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Mitigating Inconsistencies by Coupling Data Cleaning, Filtering, and Contextual Data Validation in Wireless Sensor Networks

10.25148/etd.fi09082408 ◽

2009 ◽

Author(s):

Qutub A Bakhtiar

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Data Cleaning ◽

Wireless Sensor ◽

Data Validation ◽

Contextual Data

Download Full-text