scholarly journals rotl, an R package to interact with the Open Tree of Life data

Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David Winter

1. While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. 2. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. 3. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies. Here, and in the vignettes accompanying the package, we demonstrate how rotl can be used with other R packages to analyze biodiversity data. 4. As phylogenies are being used in a growing number of applications, rotl facilitates access to phylogenetic data, and allows their integration with statistical methods and data sources available in R.

Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David Winter

1. While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. 2. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. 3. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies. Here, and in the vignettes accompanying the package, we demonstrate how rotl can be used with other R packages to analyze biodiversity data. 4. As phylogenies are being used in a growing number of applications, rotl facilitates access to phylogenetic data, and allows their integration with statistical methods and data sources available in R.


2015 ◽  
Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David Winter

While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies.


2016 ◽  
Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David Winter

While phylogenies have been getting easier to build, it has been difficult to re-use, combine, and synthesize the information they provide because published trees are often only available as image files, and taxonomic information is not standardized across studies. The Open Tree of Life (OTL) project addresses these issues by providing a digital tree that encompasses all organisms, built by combining taxonomic information and published phylogenies. The project also provides tools and services to query and download parts of this synthetic tree, as well as the source data used to build it. Here, we present rotl, an R package to search and download data from the Open Tree of Life directly in R. rotl uses common data structures allowing researchers to take advantage of the rich set of tools and methods that are available in R to manipulate, analyze, and visualize phylogenies.


2018 ◽  
Vol 2 ◽  
pp. e25564
Author(s):  
Tomer Gueta ◽  
Vijay Barve ◽  
Thiloshon Nagarajah ◽  
Ashwin Agrawal ◽  
Yohay Carmel

A new R package for biodiversity data cleaning, 'bdclean', was initiated in the Google Summer of Code (GSoC) 2017 and is available on github. Several R packages have great data validation and cleaning functions, but 'bdclean' provides features to manage a complete pipeline for biodiversity data cleaning; from data quality explorations, to cleaning procedures and reporting. Users are able go through the quality control process in a very structured, intuitive, and effective way. A modular approach to data cleaning functionality should make this package extensible for many biodiversity data cleaning needs. Under GSoC 2018, 'bdclean' will go through a comprehensive upgrade. New features will be highlighted in the demonstration.


2021 ◽  
Author(s):  
Thomas R Etherington ◽  
O. Pascal Omondiagbe

Computational geometry algorithms and data structures are widely applied across numerous scientific domains, and there a variety of R packages that implement computational geometry functionality. However, these packages often work in specific numbers of dimensions, do not have directly compatible data structures, and include additional non-computational geometry functionality that can be domain specific. Our objective in developing the compGeometeR package is to implement in a generic and consistent framework the most commonly used combinatorial computational geometry algorithms so that they can be easily combined and integrated into domain specific scientific workflows. We briefly explain the discrete and digital combinatorial computational geometry algorithms available in compGeometeR, and identify priorities for future development.


2016 ◽  
Vol 7 (12) ◽  
pp. 1476-1481 ◽  
Author(s):  
François Michonneau ◽  
Joseph W. Brown ◽  
David J. Winter
Keyword(s):  

2021 ◽  
pp. 014662162110131
Author(s):  
S. W. Choi ◽  
S. Lim ◽  
B. D. Schalet ◽  
A. J. Kaat ◽  
D. Cella

A common problem when using a variety of patient-reported outcomes (PROs) for diverse populations and subgroups is establishing a harmonized scale for the incommensurate outcomes. The lack of comparability in metrics (e.g., raw summed scores vs. scaled scores) among different PROs poses practical challenges in studies comparing effects across studies and samples. Linking has long been used for practical benefit in educational testing. Applying various linking techniques to PRO data has a relatively short history; however, in recent years, there has been a surge of published studies on linking PROs and other health outcomes, owing in part to concerted efforts such as the Patient-Reported Outcomes Measurement Information System (PROMIS®) project and the PRO Rosetta Stone (PROsetta Stone®) project ( www.prosettastone.org ). Many R packages have been developed for linking in educational settings; however, they are not tailored for linking PROs where harmonization of data across clinical studies or settings serves as the main objective. We created the PROsetta package to fill this gap and disseminate a protocol that has been established as a standard practice for linking PROs.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Charlie M. Carpenter ◽  
Daniel N. Frank ◽  
Kayla Williamson ◽  
Jaron Arbet ◽  
Brandie D. Wagner ◽  
...  

Abstract Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.


2017 ◽  
Author(s):  
Josine Min ◽  
Gibran Hemani ◽  
George Davey Smith ◽  
Caroline Relton ◽  
Matthew Suderman

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1774 ◽  
Author(s):  
Julia A. Gustavsen ◽  
Shraddha Pai ◽  
Ruth Isserlin ◽  
Barry Demchak ◽  
Alexander R. Pico

RCy3 is an R package in Bioconductor that communicates with Cytoscape via its REST API, providing access to the full feature set of Cytoscape from within the R programming environment. RCy3 has been redesigned to streamline its usage and future development as part of a broader Cytoscape Automation effort. Over 100 new functions have been added, including dozens of helper functions specifically for intuitive data overlay operations. Over 40 Cytoscape apps have implemented automation support so far, making hundreds of additional operations accessible via RCy3. Two-way conversion with networks from \textit{igraph} and \textit{graph} ensures interoperability with existing network biology workflows and dozens of other Bioconductor packages. These capabilities are demonstrated in a series of use cases involving public databases, enrichment analysis pipelines, shortest path algorithms and more. With RCy3, bioinformaticians will be able to quickly deliver reproducible network biology workflows as integrations of Cytoscape functions, complex custom analyses and other R packages.


Sign in / Sign up

Export Citation Format

Share Document