FALCONET: an R package to accelerate automatic visualisation of genome scale metabolic models

Mapping Intimacies ◽

10.1101/662056 ◽

2019 ◽

Author(s):

Hongzhong Lu ◽

Zhengming Zhu ◽

Eduard J Kerkhoven ◽

Jens Nielsen

Keyword(s):

Metabolic Networks ◽

R Package ◽

Network Size ◽

Research Community ◽

Supplementary Information ◽

Large Network ◽

Strain Design ◽

Scale Models ◽

Genome Scale ◽

Integrative Omics

AbstractSummaryFALCONET (FAst visuaLisation of COmputational NETworks) enables the automatic for-mation and visualisation of metabolic maps from genome-scale models with R and CellDesigner, readily facilitating the visualisation of multi-layers omics datasets in the context of metabolic networks.MotivationUntil now, numerous GEMs have been reconstructed and used as scaffolds to conduct integrative omics analysis and in silico strain design. Due to the large network size of GEMs, it is challenging to produce and visualize these networks as metabolic maps for further in-depth analyses.ResultsHere, we presented the R package - FALCONET, which facilitates drawing and visualizing metabolic maps in an automatic manner. This package will benefit the research community by allowing a wider use of GEMs in systems biology.Availability and implementationFALCONET is available on https://github.com/SysBioChalmers/FALCONET and released under the MIT [email protected] informationSupplementary data are available online.

Download Full-text

CPS analysis: self-contained validation of biomedical data clustering

Bioinformatics ◽

10.1093/bioinformatics/btaa165 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3516-3521 ◽

Cited By ~ 1

Author(s):

Lixiang Zhang ◽

Lin Lin ◽

Jia Li

Keyword(s):

Data Clustering ◽

State Of The Art ◽

R Package ◽

Research Community ◽

Supplementary Information ◽

Biomedical Data ◽

Data Generation ◽

Supplementary Data ◽

Point Set ◽

Class Labels

Abstract Motivation Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem barely addressed by the research community. Results We have developed a toolkit called covering point set (CPS) analysis to quantify uncertainty at the levels of individual clusters and overall partitions. Functions have been developed to effectively visualize the inherent variation in any cluster for data of high dimension, and provide more comprehensive view on potentially interesting subgroups in the data. Applying to three usage scenarios for biomedical data, we demonstrate that CPS analysis is more effective for evaluating uncertainty of clusters comparing to state-of-the-art measurements. We also showcase how to use CPS analysis to select data generation technologies or visualization methods. Availability and implementation The method is implemented in an R package called OTclust, available on CRAN. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Metabolic network-guided binning of metagenomic sequence fragments

Bioinformatics ◽

10.1093/bioinformatics/btv671 ◽

2015 ◽

Vol 32 (6) ◽

pp. 867-874 ◽

Cited By ~ 4

Author(s):

Matthew B. Biggs ◽

Jason A. Papin

Keyword(s):

Metabolic Network ◽

Dna Sequences ◽

Metabolic Networks ◽

Human Microbiome ◽

Environmental Dna ◽

Human Microbiome Project ◽

Supplementary Information ◽

Metagenomic Sequence ◽

Connectivity Score ◽

Genome Scale

Abstract Motivation: Most microbes on Earth have never been grown in a laboratory, and can only be studied through DNA sequences. Environmental DNA sequence samples are complex mixtures of fragments from many different species, often unknown. There is a pressing need for methods that can reliably reconstruct genomes from complex metagenomic samples in order to address questions in ecology, bioremediation, and human health. Results: We present the SOrting by NEtwork Completion (SONEC) approach for assigning reactions to incomplete metabolic networks based on a metabolite connectivity score. We successfully demonstrate proof of concept in a set of 100 genome-scale metabolic network reconstructions, and delineate the variables that impact reaction assignment accuracy. We further demonstrate the integration of SONEC with existing approaches (such as cross-sample scaffold abundance profile clustering) on a set of 94 metagenomic samples from the Human Microbiome Project. We show that not only does SONEC aid in reconstructing species-level genomes, but it also improves functional predictions made with the resulting metabolic networks. Availability and implementation: The datasets and code presented in this work are available at: https://bitbucket.org/mattbiggs/sorting_by_network_completion/. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

Download Full-text

redLips: a comprehensive mechanistic model of the lipid metabolic network of yeast

FEMS Yeast Research ◽

10.1093/femsyr/foaa006 ◽

2020 ◽

Vol 20 (2) ◽

Author(s):

S Tsouka ◽

V Hatzimanikatis

Keyword(s):

Lipid Metabolism ◽

Metabolic Networks ◽

Ad Hoc ◽

Mechanistic Model ◽

Model Organism ◽

Metabolic Model ◽

Lipid Biochemistry ◽

Large Size ◽

Scale Models ◽

Genome Scale

ABSTRACT Over the last decades, yeast has become a key model organism for the study of lipid biochemistry. Because the regulation of lipids has been closely linked to various physiopathologies, the study of these biomolecules could lead to new diagnostics and treatments. Before the field can reach this point, however, sufficient tools for integrating and analyzing the ever-growing availability of lipidomics data will need to be developed. To this end, genome-scale models (GEMs) of metabolic networks are useful tools, though their large size and complexity introduces too much uncertainty in the accuracy of predicted outcomes. Ideally, therefore, a model for studying lipids would contain only the pathways required for the proper analysis of these biomolecules, but would not be an ad hoc reduction. We hereby present a metabolic model that focuses on lipid metabolism constructed through the integration of detailed lipid pathways into an already existing GEM of Saccharomyces cerevisiae. Our model was then systematically reduced around the subsystems defined by these pathways to provide a more manageable model size for complex studies. We show that this model is as consistent and inclusive as other yeast GEMs regarding the focus and detail on the lipid metabolism, and can be used as a scaffold for integrating lipidomics data to improve predictions in studies of lipid-related biological functions.

Download Full-text

Boosting the extraction of elementary flux modes in genome-scale metabolic networks using the linear programming approach

Bioinformatics ◽

10.1093/bioinformatics/btaa280 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4163-4170

Author(s):

Francisco Guil ◽

José F Hidalgo ◽

José M García

Keyword(s):

Linear Programming ◽

Metabolic Networks ◽

Supplementary Information ◽

Programming Approach ◽

Supplementary Data ◽

Main Interest ◽

Elementary Flux Modes ◽

Order Of Magnitude ◽

Genome Scale ◽

Efficiency Rate

Abstract Motivation Elementary flux modes (EFMs) are a key tool for analyzing genome-scale metabolic networks, and several methods have been proposed to compute them. Among them, those based on solving linear programming (LP) problems are known to be very efficient if the main interest lies in computing large enough sets of EFMs. Results Here, we propose a new method called EFM-Ta that boosts the efficiency rate by analyzing the information provided by the LP solver. We base our method on a further study of the final tableau of the simplex method. By performing additional elementary steps and avoiding trivial solutions consisting of two cycles, we obtain many more EFMs for each LP problem posed, improving the efficiency rate of previously proposed methods by more than one order of magnitude. Availability and implementation Software is freely available at https://github.com/biogacop/Boost_LP_EFM. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

RAVEN 2.0: a versatile toolbox for metabolic network reconstruction and a case study on Streptomyces coelicolor

10.1101/321067 ◽

2018 ◽

Cited By ~ 2

Author(s):

Hao Wang ◽

Simonas Marcišauskas ◽

Benjamín J Sánchez ◽

Iván Domenzain ◽

Daniel Hermansson ◽

...

Keyword(s):

Metabolic Networks ◽

De Novo ◽

Antibiotic Production ◽

Streptomyces Coelicolor ◽

Metabolic Model ◽

Primary And Secondary Metabolism ◽

Scale Models ◽

Improved Performance ◽

Metabolites Production ◽

Genome Scale

AbstractRAVEN is a commonly used MATLAB toolbox for genome-scale metabolic model (GEM) reconstruction, curation and constraint-based modelling and simulation. Here we present RAVEN Toolbox 2.0 with major enhancements, including: (i) de novo reconstruction of GEMs based on the MetaCyc pathway database; (ii) a redesigned KEGG-based reconstruction pipeline; (iii) convergence of reconstructions from various sources; (iv) improved performance, usability, and compatibility with the COBRA Toolbox. Capabilities of RAVEN 2.0 are here illustrated through de novo reconstruction of GEMs for the antibiotic-producing bacterium Streptomyces coelicolor. Comparison of the automated de novo reconstructions with the iMK1208 model, a previously published high-quality S. coelicolor GEM, exemplifies that RAVEN 2.0 can capture most of the manually curated model. The generated de novo reconstruction is subsequently used to curate iMK1208 resulting in Sco4, the most comprehensive GEM of S. coelicolor, with increased coverage of both primary and secondary metabolism. This increased coverage allows the use of Sco4 to predict novel genome editing targets for optimized secondary metabolites production. As such, we demonstrate that RAVEN 2.0 can be used not only for de novo GEM reconstruction, but also for curating existing models based on up-to-date databases. Both RAVEN 2.0 and Sco4 are distributed through GitHub to facilitate usage and further development by the community.Author summaryCellular metabolism is a large and complex network. Hence, investigations of metabolic networks are aided by in silico modelling and simulations. Metabolic networks can be derived from whole-genome sequences, through identifying what enzymes are present and connecting these to formalized chemical reactions. To facilitate the reconstruction of genome-scale models of metabolism (GEMs), we have developed RAVEN 2.0. This versatile toolbox can reconstruct GEMs fast, through either metabolic pathway databases KEGG and MetaCyc, or from homology with an existing GEM. We demonstrate RAVEN’s functionality through generation of a metabolic model of Streptomyces coelicolor, an antibiotic-producing bacterium. Comparison of this de novo generated GEM with a previously manually curated model demonstrates that RAVEN captures most of the previous model, and we subsequently reconstructed an updated model of S. coelicolor: Sco4. Following, we used Sco4 to predict promising targets for genetic engineering, which can be used to increase antibiotic production.

Download Full-text

Converting networks to predictive logic models from perturbation signalling data with CellNOpt

10.1101/2020.03.04.976852 ◽

2020 ◽

Cited By ~ 1

Author(s):

Enio Gjerga ◽

Panuwat Trairatphisan ◽

Attila Gabor ◽

Hermann Koch ◽

Celine Chevalier ◽

...

Keyword(s):

Prior Knowledge ◽

Dynamic Models ◽

Large Data ◽

R Package ◽

Network Size ◽

Knowledge Network ◽

Supplementary Information ◽

Probabilistic Logic ◽

Boolean Models ◽

Logic Models

AbstractMotivationThe molecular changes induced by perturbations such as drugs and ligands are highly informative of the intracellular wiring. Our capacity to generate large data-sets is increasing steadily as new experimental approaches are developed. A useful way to extract mechanistic insight from the data is by integrating them with a prior knowledge network of signalling to obtain dynamic models. Logic models scale better with network size than alternative kinetic models, while keeping the interpretation of the model simple, making them particularly suitable for large datasets.ResultsCellNOpt is a collection of Bioconductor R packages for building logic models from perturbation data and prior knowledge of signalling networks. We have recently developed new components and refined the existing ones. These updates include (i) an Integer Linear Programming (ILP) formulation which guarantees efficient optimisation for Boolean models, (ii) a probabilistic logic implementation for semi-quantitative datasets and (iii) the integration of MaBoSS, a stochastic Boolean simulator. Furthermore, we introduce Dynamic-Feeder, a tool to identify missing links not present in the prior knowledge. We have also implemented systematic post-hoc analyses to highlight the key components and parameters of our models. Finally, we provide an R-Shiny tool to run CellNOpt interactively.AvailabilityR-package(s): https://github.com/saezlab/cellnoptContactjulio.saez@bioquant.uni-heidelberg.deSupplementary informationSupplemental Text.

Download Full-text

ssbio: a Python framework for structural systems biology

Bioinformatics ◽

10.1093/bioinformatics/bty077 ◽

2018 ◽

Vol 34 (12) ◽

pp. 2155-2157 ◽

Cited By ~ 15

Author(s):

Nathan Mih ◽

Elizabeth Brunk ◽

Ke Chen ◽

Edward Catoiu ◽

Anand Sastry ◽

...

Keyword(s):

Structural Information ◽

Protein Structures ◽

Structural Data ◽

Third Party ◽

Supplementary Information ◽

Scale Models ◽

Protein Properties ◽

Scale Network ◽

Structural Systems Biology ◽

Genome Scale

Abstract Summary Working with protein structures at the genome-scale has been challenging in a variety of ways. Here, we present ssbio, a Python package that provides a framework to easily work with structural information in the context of genome-scale network reconstructions, which can contain thousands of individual proteins. The ssbio package provides an automated pipeline to construct high quality genome-scale models with protein structures (GEM-PROs), wrappers to popular third-party programs to compute associated protein properties, and methods to visualize and annotate structures directly in Jupyter notebooks, thus lowering the barrier of linking 3D structural data with established systems workflows. Availability and implementation ssbio is implemented in Python and available to download under the MIT license at http://github.com/SBRG/ssbio. Documentation and Jupyter notebook tutorials are available at http://ssbio.readthedocs.io/en/latest/. Interactive notebooks can be launched using Binder at https://mybinder.org/v2/gh/SBRG/ssbio/master?filepath=Binder.ipynb. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

ErrorTracer: an algorithm for identifying the origins of inconsistencies in genome-scale metabolic models

Bioinformatics ◽

10.1093/bioinformatics/btz761 ◽

2019 ◽

Author(s):

Nikolay Martyushenko ◽

Eivind Almaas

Keyword(s):

Large Scale ◽

Work Flow ◽

Supplementary Information ◽

Scale Models ◽

Model Size ◽

Metabolic Models ◽

Model Publication ◽

Genome Scale ◽

Model Visualization ◽

Community Standard

Abstract Motivation The number and complexity of genome-scale metabolic models is steadily increasing, empowered by automated model-generation algorithms. The quality control of the models, however, has always remained a significant challenge, the most fundamental being reactions incapable of carrying flux. Numerous automated gap-filling algorithms try to address this problem, but can rarely resolve all of a model’s inconsistencies. The need for fast inconsistency checking algorithms has also been emphasized with the recent community push for automated model-validation before model publication. Previously, we wrote a graphical software to allow the modeller to solve the remaining errors manually. Nevertheless, model size and complexity remained a hindrance to efficiently tracking origins of inconsistency. Results We developed the ErrorTracer algorithm in order to address the shortcomings of existing approaches: ErrorTracer searches for inconsistencies, classifies them and identifies their origins. The algorithm is ∼2 orders of magnitude faster than current community standard methods, using only seconds even for large-scale models. This allows for interactive exploration in direct combination with model visualization, markedly simplifying the whole error-identification and correction work flow. Availability and implementation Windows and Linux executables and source code are available under the EPL 2.0 Licence at https://github.com/TheAngryFox/ModelExplorer and https://www.ntnu.edu/almaaslab/downloads. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MODELING CANCER: INTEGRATION OF "OMICS" INFORMATION IN DYNAMIC SYSTEMS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720007002990 ◽

2007 ◽

Vol 05 (04) ◽

pp. 977-986 ◽

Cited By ~ 9

Author(s):

BEATRIZ STRANSKY ◽

JUNIOR BARRERA ◽

LUCILA OHNO-MACHADO ◽

SANDRO J. DE SOUZA

Keyword(s):

Dynamic Systems ◽

Research Community ◽

Theoretical Knowledge ◽

Data Sets ◽

Detailed Mechanism ◽

Multi Scale ◽

Scale Models ◽

Genome Scale ◽

Effective Integration ◽

Scale Data

The last 10 years have seen the rise of many technologies that produce an unprecedented amount of genome-scale data from many organisms. Although the research community has been successful in exploring these data, many challenges still persist. One of them is the effective integration of such data sets directly into approaches based on mathematical modeling of biological systems. Applications in cancer are a good example. The bridge between information and modeling in cancer can be achieved by two major types of complementary strategies. First, there is a bottom–up approach, in which data generates information about structure and relationship between components of a given system. In addition, there is a top–down approach, where cybernetic and systems–theoretical knowledge are used to create models that describe mechanisms and dynamics of the system. These approaches can also be linked to yield multi-scale models combining detailed mechanism and wide biological scope. Here we give an overall picture of this field and discuss possible strategies to approach the major challenges ahead.

Download Full-text

maxnodf: an R package for fair and fast comparisons of nestedness between networks

10.1101/2020.03.20.000612 ◽

2020 ◽

Author(s):

Christoph Hoeppke ◽

Benno I. Simmons

Keyword(s):

Computation Time ◽

R Package ◽

Accurate Estimate ◽

Network Size ◽

List Type ◽

Large Network ◽

Ecological Communities ◽

Large Networks ◽

Comparative Performance ◽

Performance Benchmarking

AbstractNestedness is a widespread pattern in mutualistic networks that has high ecological and evolutionary importance due to its role in enhancing species persistence and community stability. Nestedness measures tend to be correlated with fundamental properties of networks, such as size and connectance, and so nestedness values must be normalised to enable fair comparisons between different ecological communities. Current approaches, such as using null-corrected nestedness values and z-scores, suffer from extensive statistical issues. Thus a new approach called NODFc was recently proposed, where nestedness is expressed relative to network size, connectance and the maximum nestedness that could be achieved in a particular network. While this approach is demonstrably effective in overcoming the issues of collinearity with basic network properties, it is computationally intensive to calculate, and current approaches are too slow to be practical for many types of analysis, or for analysing large networks.We developed three highly-optimised algorithms, based on greedy, hillclimbing and simulated annealing approaches, for calculation of NODFc, spread along a speed-quality continuum. Users thus have the choice between a fast algorithm with a less accurate estimate, a slower algorithm with a more accurate estimate, and an intermediate option.We outline the package, and its implementation, as well as provide comparative performance benchmarking and two example analyses. We show that maxnodf enables speed increases of hundreds of times faster than existing approaches, with large networks seeing the biggest improvements. For example, for a large network with 3000 links, computation time was reduced from 50 minutes using the fastest existing algorithm to 11 seconds using maxnodf.maxnodf makes correctly-normalised nestedness measures feasible for complex analyses of even large networks. Analyses that would previously take weeks to complete can now be finished in hours or even seconds. Given evidence that correctly normalising nestedness values can significantly change the conclusions of ecological studies, we believe this package will usher in necessary widespread use of appropriate comparative nestedness statistics.

Download Full-text