scholarly journals Introducing bdclean: a user friendly biodiversity data cleaning pipeline

2018 ◽  
Vol 2 ◽  
pp. e25564
Author(s):  
Tomer Gueta ◽  
Vijay Barve ◽  
Thiloshon Nagarajah ◽  
Ashwin Agrawal ◽  
Yohay Carmel

A new R package for biodiversity data cleaning, 'bdclean', was initiated in the Google Summer of Code (GSoC) 2017 and is available on github. Several R packages have great data validation and cleaning functions, but 'bdclean' provides features to manage a complete pipeline for biodiversity data cleaning; from data quality explorations, to cleaning procedures and reporting. Users are able go through the quality control process in a very structured, intuitive, and effective way. A modular approach to data cleaning functionality should make this package extensible for many biodiversity data cleaning needs. Under GSoC 2018, 'bdclean' will go through a comprehensive upgrade. New features will be highlighted in the demonstration.

Author(s):  
Tomer Gueta ◽  
Vijay Barve ◽  
Thiloshon Nagarajah ◽  
Povilas Gibas ◽  
Yohay Carmel

The bdverse is a collection of packages that form a general framework for facilitating biodiversity science in R. We build it to serve as a sustainable and agile infrastructure that enhances the value of biodiversity data by allowing users to conveniently employ R, for data exploration, quality assessment, data cleaning, and standardization. The bdverse supports users with and without programming capabilities. It includes six unique packages in a hierarchal structure — representing different functionality levels (Fig. 1). Major features of three core packages will be highlighted and demonstrated: (i) bdDwC provides an interactive Shiny app and a set of functions for standardizing field names in compliance with Darwin Core (DwC) format; (ii) bdchecks is an infrastructure for performing, filtering and managing various biodiversity data checks; (iii) bdclean is a user-friendly data cleaning Shiny app for the inexperienced R user. It provides features to manage complete workflow for biodiversity data cleaning, including data upload; user input - in order to adjust cleaning procedures; data cleaning; and finally, generation of various reports and versions of the data. We are now working on submitting the bdverse packages to rOpenSci software review, and as soon as the packages meet core requirements, we will officially release the bdverse. The bdverse project won the 2nd prize in the 2018 Ebbe Nielsen Challenge.


2015 ◽  
Author(s):  
Alexander Zizka ◽  
Alexandre Antonelli

1. Large-scale species occurrence data from geo-referenced observations and collected specimens are crucial for analyses in ecology, evolution and biogeography. Despite the rapidly growing availability of such data, their use in evolutionary analyses is often hampered by tedious manual classification of point occurrences into operational areas, leading to a lack of reproducibility and concerns regarding data quality. 2. Here we present speciesgeocodeR, a user-friendly R-package for data cleaning, data exploration and data visualization of species point occurrences using discrete operational areas, and linking them to analyses invoking phylogenetic trees. 3. The three core functions of the package are 1) automated and reproducible data cleaning, 2) rapid and reproducible classification of point occurrences into discrete operational areas in an adequate format for subsequent biogeographic analyses, and 3) a comprehensive summary and visualization of species distributions to explore large datasets and ensure data quality. In addition, speciesgeocodeR facilitates the access and analysis of publicly available species occurrence data, widely used operational areas and elevation ranges. Other functionalities include the implementation of minimum occurrence thresholds and the visualization of coexistence patterns and range sizes. SpeciesgeocodeR accompanies a richly illustrated and easy-to-follow tutorial and help functions.


Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David Winter

1. While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. 2. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. 3. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies. Here, and in the vignettes accompanying the package, we demonstrate how rotl can be used with other R packages to analyze biodiversity data. 4. As phylogenies are being used in a growing number of applications, rotl facilitates access to phylogenetic data, and allows their integration with statistical methods and data sources available in R.


Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David Winter

1. While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. 2. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. 3. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies. Here, and in the vignettes accompanying the package, we demonstrate how rotl can be used with other R packages to analyze biodiversity data. 4. As phylogenies are being used in a growing number of applications, rotl facilitates access to phylogenetic data, and allows their integration with statistical methods and data sources available in R.


2020 ◽  
Author(s):  
Maxime Meylan ◽  
Etienne Becht ◽  
Catherine Sautès-Fridman ◽  
Aurélien de Reyniès ◽  
Wolf H. Fridman ◽  
...  

AbstractSummaryWe previously reported MCP-counter and mMCP-counter, methods that allow precise estimation of the immune and stromal composition of human and murine samples from bulk transcriptomic data, but they were only distributed as R packages. Here, we report webMCP-counter, a user-friendly web interface to allow all users to use these methods, regardless of their proficiency in the R programming language.Availability and ImplementationFreely available from http://134.157.229.105:3838/webMCP/. Website developed with the R package shiny. Source code available from GitHub: https://github.com/FPetitprez/webMCP-counter.


PLoS Biology ◽  
2021 ◽  
Vol 19 (11) ◽  
pp. e3001460
Author(s):  
Richard Li ◽  
Ajay Ranipeta ◽  
John Wilshire ◽  
Jeremy Malczyk ◽  
Michelle Duong ◽  
...  

A vast range of research applications in biodiversity sciences requires integrating primary species, genetic, or ecosystem data with other environmental data. This integration requires a consideration of the spatial and temporal scale appropriate for the data and processes in question. But a versatile and scale flexible environmental annotation of biodiversity data remains constrained by technical hurdles. Existing tools have streamlined the intersection of occurrence records with gridded environmental data but have remained limited in their ability to address a range of spatial and temporal grains, especially for large datasets. We present the Spatiotemporal Observation Annotation Tool (STOAT), a cloud-based toolbox for flexible biodiversity–environment annotations. STOAT is optimized for large biodiversity datasets and allows user-specified spatial and temporal resolution and buffering in support of environmental characterizations that account for the uncertainty and scale of data and of relevant processes. The tool offers these services for a growing set of near global, remotely sensed, or modeled environmental data, including Landsat, MODIS, EarthEnv, and CHELSA. STOAT includes a user-friendly, web-based dashboard that provides tools for annotation task management and result visualization, linked to Map of Life, and a dedicated R package (rstoat) for programmatic access. We demonstrate STOAT functionality with several examples that illustrate phenological variation and spatial and temporal scale dependence of environmental characteristics of birds at a continental scale. We expect STOAT to facilitate broader exploration and assessment of the scale dependence of observations and processes in ecology.


2017 ◽  
Author(s):  
Marco Milanesi ◽  
Stefano Capomaccio ◽  
Elia Vajana ◽  
Lorenzo Bomba ◽  
José Fernando Garcia ◽  
...  

AbstractNowadays, molecular data analyses for biodiversity studies often require advanced bioinformatics skills, preventing many life scientists from analyzing their own data autonomously. BITE R package provides complete and user-friendly functions to handle SNP data and third-party software results (i.e. Admixture, TreeMix), facilitating their visualization, interpretation and use. Furthermore, BITE implements additional useful procedures, such as representative sampling and bootstrap for TreeMix, filling the gap in existing biodiversity data analysis tools.Availability:https://github.com/marcomilanesi/BITE


2016 ◽  
Vol 17 (3) ◽  
pp. 188-194 ◽  
Author(s):  
Gabriel Andrés Torres-Londoño ◽  
Mary Hausbeck ◽  
Jianjun Hao

Spiral plating technique is reliable, repeatable, and more efficient than dilution plating methods in studying the efficacy of antimicrobial products. In this method, the concentration of chemicals can be varied at different positions on agar plates, but its calculation requires using a commercial software. To establish a user-friendly and cost-free platform, the R package ECX was developed to calculate chemical concentrations in spiral plating technique. Mathematical models were established for calculating dispensed volume on agar plates using variables (molecular weight and agar height) that affect diffusion. In addition to the R packages, the web-based Shiny extensions ECX, multi, and ppm were developed to provide a graphical interface for calculating individual concentrations, multiple concentrations, and stock concentrations, respectively. No significant differences were observed (P > 0.05) when ECX was compared with the commercial software. The ability to import and process large datasets makes the ECX package a better option for spiral plating technique studies. Furthermore, the multiplatform nature of the ECX package overcomes limitations presented in other software. Therefore, these ECX characteristics can increase the use of the spiral plating technique for sensitivity studies. Accepted for publication 21 June 2016.


Author(s):  
Nils Kurzawa ◽  
André Mateus ◽  
Mikhail M Savitski

Abstract Summary Rtpca is an R package implementing methods for inferring protein–protein interactions (PPIs) based on thermal proteome profiling experiments of a single condition or in a differential setting via an approach called thermal proximity coaggregation. It offers user-friendly tools to explore datasets for their PPI predictive performance and easily integrates with available R packages. Availability and implementation Rtpca is available from Bioconductor (https://bioconductor.org/packages/Rtpca). Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document