Anvi’o: An advanced analysis and visualization platform for ‘omics data

10.7287/peerj.preprints.1275v1 ◽

2015 ◽

Cited By ~ 2

Author(s):

A. Murat Eren ◽

Özcan C Esen ◽

Christopher Quince ◽

Joseph H Vineis ◽

Mitchell L Sogin ◽

...

Keyword(s):

Time Series Data ◽

De Novo ◽

Draft Genome ◽

Series Data ◽

Omics Data ◽

Modular Architecture ◽

Sequence Contigs ◽

Genomic Changes ◽

Advanced Analysis ◽

Functional Features

Comprehensive analysis of shotgun metagenomic assemblies have revolutionized molecular microbial ecology, but few microbiologists command the full suite of bioinformatics skills necessary to process, interact, organize and visualize overlapping DNA sequence contigs. Here we introduce anvi’o, an advanced analysis and visualization platform for ‘omics data, and its assembly-based metagenomic workflow. Anvi’o’s interactive interface facilitates the management of contigs and associated metadata for automatic or human-guided identification of genome bins, and their curation. Its extensible visualization approach distills multiple dimensions of information about each contig into a single, intuitive display, offering a dynamic and unified work environment for data exploration, manipulation and reporting. Beyond its easy-to-use interface, the advanced modular architecture of anvi’o as a platform allows users with programming skills to implement and test novel ideas with minimal effort. To demonstrate anvi’o’s capabilities, we re-analyzed a metagenomic time-series data from an infant gut microbiome. Through the anvi’o interface we identified near-complete draft genomes, and explored temporal genomic changes within the abundant microbial populations through de novo characterization of subtle nucleotide variations. We also used anvi’o to re-analyze a collection of datasets from multiple investigators who studied microbial responses to the Deepwater Horizon oil spill. We linked metagenomic, metatranscriptomic, and single-cell genomic data from the water plume, and used the holistic perspective anvi’o provides to identify the draft genome of a previously uncharacterized, active population of Oceanospirillales. We also linked environmental isolates with metagenomes recovered from an oil-contaminated beach, and identified 56 near-complete draft genomes including abundant oil degraders whose functional features suggested an oceanic origin.

Download Full-text

Anvi’o: an advanced analysis and visualization platform for ‘omics data

PeerJ ◽

10.7717/peerj.1319 ◽

2015 ◽

Vol 3 ◽

pp. e1319 ◽

Cited By ~ 579

Author(s):

A. Murat Eren ◽

Özcan C. Esen ◽

Christopher Quince ◽

Joseph H. Vineis ◽

Hilary G. Morrison ◽

...

Keyword(s):

High Throughput Sequencing ◽

De Novo ◽

Complex Data ◽

Omics Data ◽

Multiple Sources ◽

Genomic Changes ◽

Naturally Occurring ◽

Multiple Dimensions ◽

Advanced Analysis

Advances in high-throughput sequencing and ‘omics technologies are revolutionizing studies of naturally occurring microbial communities. Comprehensive investigations of microbial lifestyles require the ability to interactively organize and visualize genetic information and to incorporate subtle differences that enable greater resolution of complex data. Here we introduce anvi’o, an advanced analysis and visualization platform that offers automated and human-guided characterization of microbial genomes in metagenomic assemblies, with interactive interfaces that can link ‘omics data from multiple sources into a single, intuitive display. Its extensible visualization approach distills multiple dimensions of information about each contig, offering a dynamic and unified work environment for data exploration, manipulation, and reporting. Using anvi’o, we re-analyzed publicly available datasets and explored temporal genomic changes within naturally occurring microbial populations throughde novocharacterization of single nucleotide variations, and linked cultivar and single-cell genomes with metagenomic and metatranscriptomic data. Anvi’o is an open-source platform that empowers researchers without extensive bioinformatics skills to perform and communicate in-depth analyses on large ‘omics datasets.

Download Full-text

Comparative High-Density Microarray Analysis of Gene Expression during Growth of Lactobacillus helveticus in Milk versus Rich Culture Medium

Applied and Environmental Microbiology ◽

10.1128/aem.00005-07 ◽

2007 ◽

Vol 73 (8) ◽

pp. 2661-2672 ◽

Cited By ~ 67

Author(s):

Vladimir V. Smeianov ◽

Patrick Wechter ◽

Jeffery R. Broadbent ◽

Joanne E. Hughes ◽

Beatriz T. Rodríguez ◽

...

Keyword(s):

Genome Sequence ◽

De Novo ◽

Cell Envelope ◽

Draft Genome ◽

Skim Milk ◽

Defined Medium ◽

Lactobacillus Helveticus ◽

Draft Genome Sequence ◽

Sequence Contigs ◽

Cheese Flavor

ABSTRACT Lactobacillus helveticus CNRZ32 is used by the dairy industry to modulate cheese flavor. The compilation of a draft genome sequence for this strain allowed us to identify and completely sequence 168 genes potentially important for the growth of this organism in milk or for cheese flavor development. The primary aim of this study was to investigate the expression of these genes during growth in milk and MRS medium by using microarrays. Oligonucleotide probes against each of the completely sequenced genes were compiled on maskless photolithography-based DNA microarrays. Additionally, the entire draft genome sequence was used to produce tiled microarrays in which noninterrupted sequence contigs were covered by consecutive 24-mer probes and associated mismatch probe sets. Total RNA isolated from cells grown in skim milk or in MRS to mid-log phase was used as a template to synthesize cDNA, followed by Cy3 labeling and hybridization. An analysis of data from annotated gene probes identified 42 genes that were upregulated during the growth of CNRZ32 in milk (P < 0.05), and 25 of these genes showed upregulation after applying Bonferroni's adjustment. The tiled microarrays identified numerous additional genes that were upregulated in milk versus MRS. Collectively, array data showed the growth of CNRZ32 in milk-induced genes encoding cell-envelope proteinases, oligopeptide transporters, and endopeptidases as well as enzymes for lactose and cysteine pathways, de novo synthesis, and/or salvage pathways for purines and pyrimidines and other functions. Genes for a hypothetical phosphoserine utilization pathway were also differentially expressed. Preliminary experiments indicate that cheese-derived, phosphoserine-containing peptides increase growth rates of CNRZ32 in a chemically defined medium. These results suggest that phosphoserine is used as an energy source during the growth of L. helveticus CNRZ32.

Download Full-text

E²GAN: End-to-End Generative Adversarial Network for Multivariate Time Series Imputation

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/429 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yonghong Luo ◽

Ying Zhang ◽

Xiangrui Cai ◽

Xiaojie Yuan

Keyword(s):

Time Series ◽

Missing Values ◽

Time Series Data ◽

Multivariate Time Series ◽

Imputation Accuracy ◽

Series Data ◽

Generative Adversarial Network ◽

Multi Stage ◽

Advanced Analysis ◽

End To End

The missing values, appear in most of multivariate time series, prevent advanced analysis of multivariate time series data. Existing imputation approaches try to deal with missing values by deletion, statistical imputation, machine learning based imputation and generative imputation. However, these methods are either incapable of dealing with temporal information or multi-stage. This paper proposes an end-to-end generative model E²GAN to impute missing values in multivariate time series. With the help of the discriminative loss and the squared error loss, E²GAN can impute the incomplete time series by the nearest generated complete time series at one stage. Experiments on multiple real-world datasets show that our model outperforms the baselines on the imputation accuracy and achieves state-of-the-art classification/regression results on the downstream applications. Additionally, our method also gains better time efficiency than multi-stage method on the training of neural networks.

Download Full-text

TimeNexus: A Novel Cytoscape App to Analyze Time-Series Data Using Temporal MultiLayer Networks (tMLNs)

10.21203/rs.3.rs-133258/v1 ◽

2020 ◽

Author(s):

Michaël Pierrelée ◽

Ana Reynders ◽

Fabrice Lopez ◽

Aziz Moqrich ◽

Laurent Tichit ◽

...

Keyword(s):

Cell Cycle ◽

Time Series ◽

Network Structure ◽

Time Series Data ◽

Series Data ◽

Omics Data ◽

Expression Data ◽

Temporal Expression ◽

Multilayer Networks ◽

Multilayer Network

Abstract Integrating -omics data with biological networks such as protein-protein interaction networks is a popular and useful approach to interpret expression changes of genes in changing conditions, and to identify relevant cellular pathways, active subnetworks or network communities. Yet, most -omics data integration tools are restricted to static networks and therefore cannot easily be used for analyzing time-series data. Determining regulations or exploring the network structure over time requires time-dependent networks which incorporate time as one component in their structure. Here, we present a method to project time-series data on sequential layers of a multilayer network, thus creating a temporal multilayer network (tMLN). We implemented this method as a Cytoscape app we named TimeNexus. TimeNexus allows to easily create, manage and visualize temporal multilayer networks starting from a combination of node and edge tables carrying the information on the temporal network structure. To allow further analysis of the tMLN, TimeNexus creates and passes on regular Cytoscape networks in form of static versions of the tMLN in three different ways: i) over the entire set of layers, ii) over two consecutive layers at a time, iii) or on one single layer at a time. We combined TimeNexus with the Cytoscape apps PathLinker and AnatApp/ANAT to extract active subnetworks from tMLNs. To test the usability of our app, we applied TimeNexus together with PathLinker or ANAT on temporal expression data of the yeast cell cycle and were able to identify active subnetworks relevant for different cell cycle phases. We furthermore used TimeNexus on our own temporal expression data from a mouse pain assay inducing hindpaw inflammation and detected active subnetworks relevant for an inflammatory response to injury, including immune response, cell stress response and regulation of apoptosis. TimeNexus is freely available from the Cytoscape app store at https://apps.cytoscape.org/apps/TimeNexus.

Download Full-text

Introducing the novel Cytoscape app TimeNexus to analyze time-series data using temporal MultiLayer Networks (tMLNs)

Scientific Reports ◽

10.1038/s41598-021-93128-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Michaël Pierrelée ◽

Ana Reynders ◽

Fabrice Lopez ◽

Aziz Moqrich ◽

Laurent Tichit ◽

...

Keyword(s):

Cell Cycle ◽

Time Series ◽

Network Structure ◽

Time Series Data ◽

Series Data ◽

Omics Data ◽

Expression Data ◽

Temporal Expression ◽

Multilayer Networks ◽

Multilayer Network

AbstractIntegrating -omics data with biological networks such as protein–protein interaction networks is a popular and useful approach to interpret expression changes of genes in changing conditions, and to identify relevant cellular pathways, active subnetworks or network communities. Yet, most -omics data integration tools are restricted to static networks and therefore cannot easily be used for analyzing time-series data. Determining regulations or exploring the network structure over time requires time-dependent networks which incorporate time as one component in their structure. Here, we present a method to project time-series data on sequential layers of a multilayer network, thus creating a temporal multilayer network (tMLN). We implemented this method as a Cytoscape app we named TimeNexus. TimeNexus allows to easily create, manage and visualize temporal multilayer networks starting from a combination of node and edge tables carrying the information on the temporal network structure. To allow further analysis of the tMLN, TimeNexus creates and passes on regular Cytoscape networks in form of static versions of the tMLN in three different ways: (i) over the entire set of layers, (ii) over two consecutive layers at a time, (iii) or on one single layer at a time. We combined TimeNexus with the Cytoscape apps PathLinker and AnatApp/ANAT to extract active subnetworks from tMLNs. To test the usability of our app, we applied TimeNexus together with PathLinker or ANAT on temporal expression data of the yeast cell cycle and were able to identify active subnetworks relevant for different cell cycle phases. We furthermore used TimeNexus on our own temporal expression data from a mouse pain assay inducing hindpaw inflammation and detected active subnetworks relevant for an inflammatory response to injury, including immune response, cell stress response and regulation of apoptosis. TimeNexus is freely available from the Cytoscape app store at https://apps.cytoscape.org/apps/TimeNexus.

Download Full-text

De novo reconstruction of gene regulatory networks from time series data, an approach based on formal methods

Methods ◽

10.1016/j.ymeth.2014.06.005 ◽

2014 ◽

Vol 69 (3) ◽

pp. 298-305 ◽

Cited By ~ 36

Author(s):

Michele Ceccarelli ◽

Luigi Cerulo ◽

Antonella Santone

Keyword(s):

Time Series ◽

Formal Methods ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Time Series Data ◽

De Novo ◽

Series Data ◽

Gene Regulatory

Download Full-text

KOPTIC: A novel approach for in silico prediction of enzyme kinetics and regulation

10.1101/807628 ◽

2019 ◽

Author(s):

Wheaton L. Schroeder ◽

Rajib Saha

Keyword(s):

In Silico ◽

Time Series Data ◽

Metabolic Model ◽

Accurate Method ◽

Growth Stress ◽

Series Data ◽

Omics Data ◽

Novel Approach ◽

Transcriptional Regulatory Mechanisms ◽

Genetic Interventions

AbstractKinetic models of metabolism (kMMs) provide not only a more accurate method for designing novel biological systems but also characterization of system regulations; however, the multi-‘omics’ data required is prohibitive to their development and widespread use. Here, we introduce a new approach named Kinetic OPTimization using Integer Conditions (KOPTIC), which can circumvent the ‘omics’ data requirement and semi-automate kMM construction using in silico reaction flux data and metabolite concentration estimates derived from a metabolic network model to return plausible reaction mechanisms, regulations, and kinetic parameters (defined as ‘reactomics’) using an optimization-based approach. As a benchmark for the performance of KOPTIC, a previously published, four-tissue (leaf, root, seed, and stem) metabolic model of Arabidopsis thaliana was used, consisting of major primary carbon metabolism pathways, named p-ath780 (1015 reactions, 901 metabolites, and 780 genes). Data required for KOPTIC was derived from an Arabidopsis’ lifecycle of 61 days. Nine separate regulator restriction sets (allowing multiple solutions) defining KOPTIC runs hypothesized 3577 total regulatory interactions involving metabolic, allosteric, and transcriptional regulatory mechanisms (with nearly 40 verified by existing literature) with a median fit error of 13.44%. Flux rates of most KOPTIC fits were found to be significantly correlated with (93.6% with p < 0.05) and approximately 1:1 (r = 0.775, p ≪ 0.001) to the input time-series data. Thus, KOPTIC can hypothesize maps the regulatory landscape for a specific reaction, out of which the most relevant regulatory interaction(s) can be defined by the desired growth/stress conditions or the desired genetic interventions for use in the creation of kMMs.

Download Full-text

How to predict relapse in leukemia using time series data: A comparative in silico study

PLoS ONE ◽

10.1371/journal.pone.0256585 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0256585

Author(s):

Helene Hoffmann ◽

Christoph Baldow ◽

Thomas Zerjatke ◽

Andrea Gottschalk ◽

Sebastian Wagner ◽

...

Keyword(s):

Data Quality ◽

Computational Methods ◽

Prediction Accuracy ◽

Time Course ◽

Time Series Data ◽

Reference Data ◽

De Novo ◽

Specific Treatment ◽

Series Data ◽

Patient Specific

Risk stratification and treatment decisions for leukemia patients are regularly based on clinical markers determined at diagnosis, while measurements on system dynamics are often neglected. However, there is increasing evidence that linking quantitative time-course information to disease outcomes can improve the predictions for patient-specific treatment responses. We designed a synthetic experiment simulating response kinetics of 5,000 patients to compare different computational methods with respect to their ability to accurately predict relapse for chronic and acute myeloid leukemia treatment. Technically, we used clinical reference data to first fit a model and then generate de novo model simulations of individual patients’ time courses for which we can systematically tune data quality (i.e. measurement error) and quantity (i.e. number of measurements). Based hereon, we compared the prediction accuracy of three different computational methods, namely mechanistic models, generalized linear models, and deep neural networks that have been fitted to the reference data. Reaching prediction accuracies between 60 and close to 100%, our results indicate that data quality has a higher impact on prediction accuracy than the specific choice of the particular method. We further show that adapted treatment and measurement schemes can considerably improve the prediction accuracy by 10 to 20%. Our proof-of-principle study highlights how computational methods and optimized data acquisition strategies can improve risk assessment and treatment of leukemia patients.

Download Full-text

De novo homology assessment from landmark data: A workflow to identify and track segmented structures in plant time series images

10.1101/2021.02.21.432162 ◽

2021 ◽

Author(s):

John G. Hodge ◽

Qing Li ◽

Andrew N. Doust

Keyword(s):

Time Series ◽

Time Series Data ◽

De Novo ◽

Control Function ◽

Basic Research ◽

Series Data ◽

Differential Growth ◽

Landmark Data ◽

Clustering Approach ◽

Time Series Images

AbstractAssessing the phenotypes underlying plant growth and development is integral to exploring the development, genetics, and evolution of morphology and plays an essential role in agronomic and basic research studies. Although various automated or semi-automated phenomic approaches have recently been developed, tools assessing differential growth of plant organs remains a key topic of interest, but one which is often difficult to analyze due to the requirements of segmenting and annotating specific structures or positions in the plant body in time-series data. To address this gap, we have developed a generalized workflow linking our previously published function, acute, with a companion function, homology, in the PlantCV environment. The homology function uses a generalized strategy of dimensionality reduction via starscape followed by hierarchical clustering through constella to identify ‘constellations’ of segments in eigenspace that represent the same landmark in consecutive images of a time-series. We devised a quality control function, constellaQC, that can test the accuracy of the clustering approach, and we use it to show that the approach accurately clustered the pseudo-landmarks derived from acute, although with several sources of error. We discuss the reasons for and consequences of these errors in automated workflows, and suggest how to develop these functions so that they can easily be repurposed for other phenomics datasets that may vary in dimensional complexity.

Download Full-text