Metacoder: An R Package for Visualization and Manipulation of Community Taxonomic Diversity Data

Mapping Intimacies ◽

10.1101/071019 ◽

2016 ◽

Cited By ~ 4

Author(s):

Zachary S. L. Foster ◽

Thomas J. Sharpton ◽

Niklaus J. Grünwald

Keyword(s):

Geographic Location ◽

Community Analysis ◽

Taxonomic Diversity ◽

R Package ◽

Digital Pcr ◽

Location Data ◽

Level Data ◽

The Hierarchical Structure ◽

Hierarchical Nature ◽

Tree Visualization

AbstractCommunity-level data, the type generated by an increasing number of metabarcoding studies, is often graphed as stacked bar charts or pie graphs that use color to represent taxa. These graph types do not convey the hierarchical structure of taxonomic classifications and are limited by the use of color for categories. As an alternative, we developed metacoder, an R package for easily parsing, manipulating, and graphing publication-ready plots of hierarchical data. Metacoder includes a dynamic and flexible function that can parse most text-based formats that contain taxonomic classifications, taxon names, taxon identifiers, or sequence identifiers. Metacoder can then subset, sample, and order this parsed data using a set of intuitive functions that take into account the hierarchical nature of the data. Finally, an extremely flexible plotting function enables quantitative representation of up to 4 arbitrary statistics simultaneously in a tree format by mapping statistics to the color and size of tree nodes and edges. Metacoder also allows exploration of barcode primer bias by integrating functions to run digital PCR. Although it has been designed for data from metabarcoding research, metacoder can easily be applied to any data that has a hierarchical component such as gene ontology or geographic location data. Our package complements currently available tools for community analysis and is provided open source with an extensive online user manual.Note: This article was previously submitted as a pre-print: Zachary S. L. Foster, Thomas J. Sharpton, Niklaus J. Grünwald. 2016. Metacoder: An R package for manipulation and heat tree visualization of community taxonomic data from metabar-coding. BioRxiv 071019; doi: http://dx.doi.org/10.1101/071019.

Download Full-text

Phylogeographic Estimation and Simulation of Global Diffusive Dispersal

Systematic Biology ◽

10.1093/sysbio/syaa061 ◽

2020 ◽

Author(s):

Stilianos Louca

Keyword(s):

Brownian Motion ◽

Geographic Location ◽

R Package ◽

Estimation Methods ◽

Influenza B ◽

Independent Contrasts ◽

Time Resolved ◽

Location Data ◽

Brownian Motion Model ◽

Random Walk Simulation

Abstract The analysis of time-resolved phylogenies (timetrees) and geographic location data allows estimation of dispersal rates, for example, for invasive species and infectious diseases. Many estimation methods are based on the Brownian Motion model for diffusive dispersal on a 2D plane; however, the accuracy of these methods deteriorates substantially when dispersal occurs at global scales because spherical Brownian motion (SBM) differs from planar Brownian motion. No statistical method exists for estimating SBM diffusion coefficients from a given timetree and tip coordinates, and no method exists for simulating SBM along a given timetree. Here, I present new methods for simulating SBM along a given timetree, and for estimating SBM diffusivity from a given timetree and tip coordinates using a modification of Felsenstein’s independent contrasts and maximum likelihood. My simulation and fitting methods can accommodate arbitrary time-dependent diffusivities and scale efficiently to trees with millions of tips, thus enabling new analyses even in cases where planar BM would be a sufficient approximation. I demonstrate these methods using a timetree of marine and terrestrial Cyanobacterial genomes, as well as timetrees of two globally circulating Influenza B clades. My methods are implemented in the R package “castor.” [Independent contrasts; phylogenetic; random walk; simulation; spherical Brownian motion.]

Download Full-text

GPSeqClus: an r package for sequential clustering of animal location data for model building, model application, and field site investigations

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13572 ◽

2021 ◽

Author(s):

Justin G. Clapp ◽

Joseph D. Holbrook ◽

Daniel J. Thompson

Keyword(s):

Model Building ◽

R Package ◽

Field Site ◽

Model Application ◽

Location Data ◽

Building Model ◽

Site Investigations ◽

Sequential Clustering

Download Full-text

Meffil: efficient normalisation and analysis of very large DNA methylation samples

10.1101/125963 ◽

2017 ◽

Cited By ~ 17

Author(s):

Josine Min ◽

Gibran Hemani ◽

George Davey Smith ◽

Caroline Relton ◽

Matthew Suderman

Keyword(s):

Dna Methylation ◽

Association Studies ◽

R Package ◽

Individual Level ◽

Technological Advances ◽

Level Data ◽

Fixed And Random Effects ◽

R Packages ◽

Meta Analyses ◽

Dramatic Growth

AbstractBackgroundTechnological advances in high throughput DNA methylation microarrays have allowed dramatic growth of a new branch of epigenetic epidemiology. DNA methylation datasets are growing ever larger in terms of the number of samples profiled, the extent of genome coverage, and the number of studies being meta-analysed. Novel computational solutions are required to efficiently handle these data.MethodsWe have developed meffil, an R package designed to quality control, normalize and perform epigenome-wide association studies (EWAS) efficiently on large samples of Illumina Infinium HumanMethylation450 and MethylationEPIC BeadChip microarrays. We tested meffil by applying it to 6000 450k microarrays generated from blood collected for two different datasets, Accessible Resource for Integrative Epigenomic Studies (ARIES) and The Genetics of Overweight Young Adults (GOYA) study.ResultsA complete reimplementation of functional normalization minimizes computational memory requirements to 5% of that required by other R packages, without increasing running time. Incorporating fixed and random effects alongside functional normalization, and automated estimation of functional normalisation parameters reduces technical variation in DNA methylation levels, thus reducing false positive associations and improving power. We also demonstrate that the ability to normalize datasets distributed across physically different locations without sharing any biologically-based individual-level data may reduce heterogeneity in meta-analyses of epigenome-wide association studies. However, we show that when batch is perfectly confounded with cases and controls functional normalization is unable to prevent spurious associations.Conclusionsmeffil is available online (https://github.com/perishky/meffil/) along with tutorials covering typical use cases.

Download Full-text

Diat.barcode: a DNA tool to decipher diatom communities for the evaluation environmental pressures

ARPHA Conference Abstracts ◽

10.3897/aca.4.e64940 ◽

2021 ◽

Vol 4 ◽

Cited By ~ 1

Author(s):

Frederic Rimet ◽

Teofana Chonova ◽

Gilles Gassiole ◽

Maria Kahlert ◽

François Keck ◽

...

Keyword(s):

Taxonomic Diversity ◽

R Package ◽

Life Forms ◽

Reference Database ◽

Sequencing Data ◽

Culture Collections ◽

Environmental Pressures ◽

Ecological Features ◽

Dna Metabarcoding ◽

Environmental Sequences

Diatoms (Bacillariophyta) are ubiquitous microalgae, which present a huge taxonomic diversity, changing in correlation with differing environmental conditions. This makes them excellent ecological indicators for various ecosystems and ecological problematics (ecotoxicology, biomonitoring, paleo-environmental reconstruction …). Current standardized methodologies for diatoms are based on microscopic determinations, which is time consuming and prone to identification uncertainties. DNA metabarcoding has been proposed as a way to avoid these flaws, enabling the sequencing of a large quantity of barcodes from natural samples. A taxonomic identity is given to these barcodes by comparing their sequences to a barcoding reference library. However, to identify environmental sequences correctly, the reference database should contain a representative number of reference sequences to ensure a good coverage of diatom diversity. Moreover, the reference database needs to be carefully taxonomically curated by experts, as its content has an obvious impact on species detection. Diat.barcode is an open-access library for diatoms linking diatom taxonomic identities to rbcL barcode sequences (a chloroplast marker suitable for species-level identification of diatoms), which has been maintained since 2012. Data are accumulated from three sources: (1) the NCBI nucleotide database, (2) unpublished sequencing data of culture collections and more recently (3) environmental sequences. Since 2017, an international network of experts in diatom taxonomy curate this library. The last version of the database (version 9.2), includes 8066 entries that correspond to more than 280 different genera and 1490 different species. In addition to the taxonomic information, morphological features (e.g. biovolumes, chloroplasts, etc.), life-forms (mobility, colony-type) and ecological features (taxa preferences to pollution) are given. The database can be downloaded from the website (www6.inrae.fr/carrtel-collection/Barcoding-database/) or directly through the R package diatbarcode. Ready-to-use files for commonly used metabarcoding pipelines (Mothur and DADA2) are also available.

Download Full-text

Vertical and horizontal integration of multi-omics data with miodin

BMC Bioinformatics ◽

10.1186/s12859-019-3224-4 ◽

2019 ◽

Vol 20 (1) ◽

Cited By ~ 13

Author(s):

Benjamin Ulfenborg

Keyword(s):

Data Analysis ◽

R Package ◽

Heterogeneous Data ◽

Omics Data ◽

Technical Expertise ◽

Horizontal Integration ◽

Level Data ◽

Genomics And Proteomics ◽

Health And Disease ◽

Molecular Layers

Abstract Background Studies on multiple modalities of omics data such as transcriptomics, genomics and proteomics are growing in popularity, since they allow us to investigate complex mechanisms across molecular layers. It is widely recognized that integrative omics analysis holds the promise to unlock novel and actionable biological insights into health and disease. Integration of multi-omics data remains challenging, however, and requires combination of several software tools and extensive technical expertise to account for the properties of heterogeneous data. Results This paper presents the miodin R package, which provides a streamlined workflow-based syntax for multi-omics data analysis. The package allows users to perform analysis of omics data either across experiments on the same samples (vertical integration), or across studies on the same variables (horizontal integration). Workflows have been designed to promote transparent data analysis and reduce the technical expertise required to perform low-level data import and processing. Conclusions The miodin package is implemented in R and is freely available for use and extension under the GPL-3 license. Package source, reference documentation and user manual are available at https://gitlab.com/algoromics/miodin.

Download Full-text

An open source R package for Droplet Digital PCR analysis

European Journal of Cancer ◽

10.1016/s0959-8049(16)61656-8 ◽

2016 ◽

Vol 61 ◽

pp. S185

Author(s):

A. Chiu ◽

G. Brady ◽

M. Ayub ◽

C. Dive ◽

C. Miller

Keyword(s):

Open Source ◽

R Package ◽

Digital Pcr ◽

Droplet Digital Pcr ◽

Pcr Analysis

Download Full-text

TreeMap: a structured approach to fine mapping of eQTL variants

Bioinformatics ◽

10.1093/bioinformatics/btaa927 ◽

2020 ◽

Author(s):

Li Liu ◽

Pramod Chandrashekar ◽

Biao Zeng ◽

Maxwell D Sanderford ◽

Sudhir Kumar ◽

...

Keyword(s):

Long Range ◽

Fine Mapping ◽

Regulation Of Gene Expression ◽

R Package ◽

Supplementary Information ◽

Cis Acting ◽

Regulatory Variants ◽

Daunting Task ◽

The Hierarchical Structure ◽

Causal Variants

Abstract Motivation Expression quantitative trait loci (eQTL) harbor genetic variants modulating gene transcription. Fine mapping of regulatory variants at these loci is a daunting task due to the juxtaposition of causal and linked variants at a locus as well as the likelihood of interactions among multiple variants. This problem is exacerbated in genes with multiple cis-acting eQTL, where superimposed effects of adjacent loci further distort the association signals. Results We developed a novel algorithm, TreeMap, that identifies putative causal variants in cis-eQTL accounting for multisite effects and genetic linkage at a locus. Guided by the hierarchical structure of linkage disequilibrium, TreeMap performs an organized search for individual and multiple causal variants. Via extensive simulations, we show that TreeMap detects co-regulating variants more accurately than current methods. Furthermore, its high computational efficiency enables genome-wide analysis of long-range eQTL. We applied TreeMap to GTEx data of brain hippocampus samples and transverse colon samples to search for eQTL in gene bodies and in 4 Mbps gene-flanking regions, discovering numerous distal eQTL. Furthermore, we found concordant distal eQTL that were present in both brain and colon samples, implying long-range regulation of gene expression. Availability and implementation TreeMap is available as an R package enabled for parallel processing at https://github.com/liliulab/treemap. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Perspective for Aquaponic Systems: “Omic” Technologies for Microbial Community Analysis

BioMed Research International ◽

10.1155/2015/480386 ◽

2015 ◽

Vol 2015 ◽

pp. 1-10 ◽

Cited By ~ 25

Author(s):

Perla Munguia-Fragozo ◽

Oscar Alatorre-Jacome ◽

Enrique Rico-Garcia ◽

Irineo Torres-Pacheco ◽

Andres Cruz-Hernandez ◽

...

Keyword(s):

Microbial Community ◽

Nutrient Dynamics ◽

Microbial Community Composition ◽

Chemical Species ◽

Community Analysis ◽

Taxonomic Diversity ◽

Microbial Community Analysis ◽

Emergent Properties ◽

Microbial Profile ◽

Set Up

Aquaponics is the combined production of aquaculture and hydroponics, connected by a water recirculation system. In this productive system, the microbial community is responsible for carrying out the nutrient dynamics between the components. The nutrimental transformations mainly consist in the transformation of chemical species from toxic compounds into available nutrients. In this particular field, the microbial research, the “Omic” technologies will allow a broader scope of studies about a current microbial profile inside aquaponics community, even in those species that currently are unculturable. This approach can also be useful to understand complex interactions of living components in the system. Until now, the analog studies were made to set up the microbial characterization on recirculation aquaculture systems (RAS). However, microbial community composition of aquaponics is still unknown. “Omic” technologies like metagenomic can help to reveal taxonomic diversity. The perspectives are also to begin the first attempts to sketch the functional diversity inside aquaponic systems and its ecological relationships. The knowledge of the emergent properties inside the microbial community, as well as the understanding of the biosynthesis pathways, can derive in future biotechnological applications. Thus, the aim of this review is to show potential applications of current “Omic” tools to characterize the microbial community in aquaponic systems.

Download Full-text

Integrating GIScience and Crop Science datasets: a study involving genetic, geographic and environmental data

10.7287/peerj.preprints.2248 ◽

2017 ◽

Author(s):

Roberto Santos ◽

Adam Algar ◽

Richard Field ◽

Sean Mayes

Keyword(s):

Information Science ◽

Geographic Location ◽

Environmental Data ◽

Anthropogenic Factors ◽

Geospatial Information ◽

Location Data ◽

Body Of Knowledge ◽

Crop Science ◽

Combining Data ◽

Individual Body

Sharing and reusing data in research is a welcome and encouraged practice since it maximises the scientific outcomes given limited financial, material and human resources. Interdisciplinary research is considered to benefit from this practice, uniting researchers and data from two or more disciplines to advance fundamental understanding or tackle problems whose solution is beyond the limit of an individual body of knowledge. Here we discuss the challenges of combining data across disciplines, focusing in particular on associating geographic location data with genetic data in the context of a project involving Crop Science and Geospatial Information Science disciplines. This project aims to improve understanding of how geographical, environmental and anthropogenic factors affect the genetic variation in a neglected and underutilised crop called Bambara groundnut.

Download Full-text

ProMetIS, deep phenotyping of mouse models by combined proteomics and metabolomics analysis

Scientific Data ◽

10.1038/s41597-021-01095-3 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Alyssa Imbert ◽

Magali Rompais ◽

Mohammed Selloum ◽

Florence Castelli ◽

Emmanuelle Mouton-Barbosa ◽

...

Keyword(s):

Mouse Models ◽

Physiological Role ◽

R Package ◽

Level Data ◽

Molecular Information ◽

Number Of Individuals ◽

Deep Phenotyping ◽

Omics Data Integration

AbstractGenes are pleiotropic and getting a better knowledge of their function requires a comprehensive characterization of their mutants. Here, we generated multi-level data combining phenomic, proteomic and metabolomic acquisitions from plasma and liver tissues of two C57BL/6 N mouse models lacking the Lat (linker for activation of T cells) and the Mx2 (MX dynamin-like GTPase 2) genes, respectively. Our dataset consists of 9 assays (1 preclinical, 2 proteomics and 6 metabolomics) generated with a fully non-targeted and standardized approach. The data and processing code are publicly available in the ProMetIS R package to ensure accessibility, interoperability, and reusability. The dataset thus provides unique molecular information about the physiological role of the Lat and Mx2 genes. Furthermore, the protocols described herein can be easily extended to a larger number of individuals and tissues. Finally, this resource will be of great interest to develop new bioinformatic and biostatistic methods for multi-omics data integration.

Download Full-text