Inferring biochemical reactions and metabolite structures to cope with metabolic pathway drift

Mapping Intimacies ◽

10.1101/462556 ◽

2018 ◽

Cited By ~ 1

Author(s):

Arnaud Belcour ◽

Jean Girard ◽

Méziane Aite ◽

Ludovic Delage ◽

Camille Trottier ◽

...

Keyword(s):

Metabolic Pathway ◽

Metabolic Pathways ◽

Logical Reasoning ◽

Model Organisms ◽

Enzymatic Reactions ◽

Proof Of Concept ◽

Biochemical Reactions ◽

Metabolomic Data ◽

Metabolic Models ◽

Genome Scale

AbstractInferring genome-scale metabolic networks in emerging model organisms is challenging because of incomplete biochemical knowledge and incomplete conservation of biochemical pathways during evolution. This limits the possibility to automatically transfer knowledge from well-established model organisms. Therefore, specific bioinformatic tools are necessary to infer new biochemical reactions and new metabolic structures that can be checked experimentally. Using an integrative approach combining both genomic and metabolomic data in the red algal model Chondrus crispus, we show that, even metabolic pathways considered as conserved, like sterol or mycosporine-like amino acids (MAA) synthesis pathways, undergo substantial turnover. This phenomenon, which we formally define as “metabolic pathway drift”, is consistent with findings from other areas of evolutionary biology, indicating that a given phenotype can be conserved even if the underlying molecular mechanisms are changing. We present a proof of concept with a new methodological approach to formalize the logical reasoning necessary to infer new reactions and new molecular structures, based on previous biochemical knowledge. We use this approach to infer previously unknown reactions in the sterol and MAA pathways.Author summaryGenome-scale metabolic models describe our current understanding of all metabolic pathways occuring in a given organism. For emerging model species, where few biochemical data are available about really occurring enzymatic activities, such metabolic models are mainly based on transferring knowledge from other more studied species, based on the assumption that the same genes have the same function in the compared species. However, integration of metabolomic data into genome-scale metabolic models leads to situations where gaps in pathways cannot be filled by known enzymatic reactions from existing databases. This is due to structural variation in metabolic pathways accross evolutionary time. In such cases, it is necessary to use complementary approaches to infer new reactions and new metabolic intermediates using logical reasoning, based on available partial biochemical knowledge. Here we present a proof of concept that this is feasible and leads to hypotheses that are precise enough to be a starting point for new experimental work.

Download Full-text

Consistency, Inconsistency and Ambiguity of Metabolite Names in Biochemical Databases Used for Genome Scale Metabolic Modelling

10.1101/503664 ◽

2018 ◽

Cited By ~ 1

Author(s):

Nhung Pham ◽

Ruben Van Heck ◽

Jesse van Dam ◽

Peter Schaap ◽

Edoardo Saccenti ◽

...

Keyword(s):

Systems Medicine ◽

Biochemical Reactions ◽

Metabolic Modelling ◽

Research Areas ◽

Limit Model ◽

Metabolic Models ◽

Genome Scale ◽

Manual Verification

Genome scale metabolic models (GEMs) are manually curated repositories describing the metabolic capabilities of an organism. GEMs have been successfully used in different research areas, ranging from systems medicine to biotechnology. However, the different naming conventions (namespaces) of databases used to build GEMs limit model reusability and prevent the integration of existing models. This problem is known in the GEM community but its extent has not been analyzed in depth. In this study, we investigate the name ambiguity and the multiplicity of non-systematic identifiers and we highlight the (in)consistency in their use in eleven biochemical databases of biochemical reactions and the problems that arise when mapping between different namespaces and databases. We found that such inconsistencies can be as high as 83.1%, thus emphasizing the need for strategies to deal with these issues. Currently, manual verification of the mappings appears to be the only solution to remove inconsistencies when combining models. Finally, we discuss several possible approaches to facilitate (future) unambiguous mapping.

Download Full-text

The ModelSEED Database for the integration of metabolic annotations and the reconstruction, comparison, and analysis of metabolic models for plants, fungi, and microbes

10.1101/2020.03.31.018663 ◽

2020 ◽

Cited By ~ 1

Author(s):

Samuel M. D. Seaver ◽

Filipe Liu ◽

Qizhi Zhang ◽

James Jeffryes ◽

José P. Faria ◽

...

Keyword(s):

Metabolic Pathways ◽

Draft Genome ◽

Biochemical Network ◽

Plant Genomes ◽

Flux Balance ◽

Rosetta Stone ◽

Balance Analysis ◽

Comparison And Analysis ◽

Metabolic Models ◽

Genome Scale

ABSTRACTFor over ten years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions;; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical “Rosetta Stone” to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies, and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org and KBase.

Download Full-text

Fast automated reconstruction of genome-scale metabolic models for microbial species and communities

10.1101/223198 ◽

2018 ◽

Cited By ~ 2

Author(s):

Daniel Machado ◽

Sergej Andrejev ◽

Melanie Tramontano ◽

Kiran Raosaheb Patil

Keyword(s):

Microbial Communities ◽

Single Species ◽

Model Organisms ◽

Universal Model ◽

Microbial Species ◽

Scale Models ◽

Metabolic Models ◽

User Friendly ◽

Genome Scale ◽

Automated Tool

AbstractGenome-scale metabolic models are instrumental in uncovering operating principles of cellular metabolism and model-guided re-engineering. Recent applications of metabolic models have also demonstrated their usefulness in unraveling cross-feeding within microbial communities. Yet, the application of genome-scale models, especially to microbial communities, is lagging far behind the availability of sequenced genomes. This is largely due to the time-consuming steps of manual cura-tion required to obtain good quality models and thus physiologically meaningful simulation results. Here, we present an automated tool – CarveMe – for reconstruction of species and community level metabolic models. We introduce the concept of a universal model, which is manually curated and simulation-ready. Starting with this universal model and annotated genome sequences, CarveMe uses a top-down approach to build single-species and community models in a fast and scalable manner. We build reconstructions for two model organisms, Escherichia coli and Bacillus subtillis, as well as a collection of human gut bacteria, and show that CarveMe models perform similarly to manually curated models in reproducing experimental phenotypes. Finally, we demonstrate the scalability of CarveMe through reconstructing 5587 bacterial models. Overall, CarveMe provides an open-source and user-friendly tool towards broadening the use of metabolic modeling in studying microbial species and communities.

Download Full-text

MetaNetX/MNXref: unified namespace for metabolites and biochemical reactions in the context of metabolic models

Nucleic Acids Research ◽

10.1093/nar/gkaa992 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D570-D574

Author(s):

Sébastien Moretti ◽

Van Du T Tran ◽

Florence Mehl ◽

Mark Ibberson ◽

Marco Pagni

Keyword(s):

Metabolic Network ◽

Intrinsic Properties ◽

Biochemical Reactions ◽

Online Services ◽

Major Improvement ◽

Sparql Endpoint ◽

Cross Links ◽

Metabolic Models ◽

Genome Scale

Abstract MetaNetX/MNXref is a reconciliation of metabolites and biochemical reactions providing cross-links between major public biochemistry and Genome-Scale Metabolic Network (GSMN) databases. The new release brings several improvements with respect to the quality of the reconciliation, with particular attention dedicated to preserving the intrinsic properties of GSMN models. The MetaNetX website (https://www.metanetx.org/) provides access to the full database and online services. A major improvement is for mapping of user-provided GSMNs to MXNref, which now provides diagnostic messages about model content. In addition to the website and flat files, the resource can now be accessed through a SPARQL endpoint (https://rdf.metanetx.org).

Download Full-text

gapseq: informed prediction of bacterial metabolic pathways and reconstruction of accurate metabolic models

Genome Biology ◽

10.1186/s13059-021-02295-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Johannes Zimmermann ◽

Christoph Kaleta ◽

Silvio Waschina

Keyword(s):

Experimental Data ◽

Metabolic Pathways ◽

Scientific Literature ◽

State Of The Art ◽

Gap Filling ◽

Fermentation Products ◽

Metabolic Interactions ◽

Metabolic Models ◽

Genome Scale ◽

Carbon Source Utilisation

AbstractGenome-scale metabolic models of microorganisms are powerful frameworks to predict phenotypes from an organism’s genotype. While manual reconstructions are laborious, automated reconstructions often fail to recapitulate known metabolic processes. Here we present (https://github.com/jotech/gapseq), a new tool to predict metabolic pathways and automatically reconstruct microbial metabolic models using a curated reaction database and a novel gap-filling algorithm. On the basis of scientific literature and experimental data for 14,931 bacterial phenotypes, we demonstrate that gapseq outperforms state-of-the-art tools in predicting enzyme activity, carbon source utilisation, fermentation products, and metabolic interactions within microbial communities.

Download Full-text

Genome-scale metabolic models highlight stage-specific differences in essential metabolic pathways in Trypanosoma cruzi

PLoS Neglected Tropical Diseases ◽

10.1371/journal.pntd.0008728 ◽

2020 ◽

Vol 14 (10) ◽

pp. e0008728

Author(s):

Isabel S. Shiratsubaki ◽

Xin Fang ◽

Rodolpho O. O. Souza ◽

Bernhard O. Palsson ◽

Ariel M. Silber ◽

...

Keyword(s):

Trypanosoma Cruzi ◽

Metabolic Pathways ◽

Metabolic Models ◽

Genome Scale

Download Full-text

Consistency, Inconsistency, and Ambiguity of Metabolite Names in Biochemical Databases Used for Genome-Scale Metabolic Modelling

Metabolites ◽

10.3390/metabo9020028 ◽

2019 ◽

Vol 9 (2) ◽

pp. 28 ◽

Cited By ~ 8

Author(s):

Nhung Pham ◽

Ruben van Heck ◽

Jesse van Dam ◽

Peter Schaap ◽

Edoardo Saccenti ◽

...

Keyword(s):

Systems Medicine ◽

Biochemical Reactions ◽

Metabolic Modelling ◽

Research Areas ◽

Limit Model ◽

Metabolic Models ◽

Genome Scale ◽

Manual Verification

Genome-scale metabolic models (GEMs) are manually curated repositories describing the metabolic capabilities of an organism. GEMs have been successfully used in different research areas, ranging from systems medicine to biotechnology. However, the different naming conventions (namespaces) of databases used to build GEMs limit model reusability and prevent the integration of existing models. This problem is known in the GEM community, but its extent has not been analyzed in depth. In this study, we investigate the name ambiguity and the multiplicity of non-systematic identifiers and we highlight the (in)consistency in their use in 11 biochemical databases of biochemical reactions and the problems that arise when mapping between different namespaces and databases. We found that such inconsistencies can be as high as 83.1%, thus emphasizing the need for strategies to deal with these issues. Currently, manual verification of the mappings appears to be the only solution to remove inconsistencies when combining models. Finally, we discuss several possible approaches to facilitate (future) unambiguous mapping.

Download Full-text

Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters

BMC Bioinformatics ◽

10.1186/s12859-021-03985-0 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 1

Author(s):

Snorre Sulheim ◽

Fredrik A. Fossheim ◽

Alexander Wentzel ◽

Eivind Almaas

Keyword(s):

Heterologous Expression ◽

Bioactive Compounds ◽

Metabolic Pathways ◽

Gene Clusters ◽

Strain Engineering ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Wide Range ◽

Metabolic Models ◽

Genome Scale

Abstract Background A wide range of bioactive compounds is produced by enzymes and enzymatic complexes encoded in biosynthetic gene clusters (BGCs). These BGCs can be identified and functionally annotated based on their DNA sequence. Candidates for further research and development may be prioritized based on properties such as their functional annotation, (dis)similarity to known BGCs, and bioactivity assays. Production of the target compound in the native strain is often not achievable, rendering heterologous expression in an optimized host strain as a promising alternative. Genome-scale metabolic models are frequently used to guide strain development, but large-scale incorporation and testing of heterologous production of complex natural products in this framework is hampered by the amount of manual work required to translate annotated BGCs to metabolic pathways. To this end, we have developed a pipeline for an automated reconstruction of BGC associated metabolic pathways responsible for the synthesis of non-ribosomal peptides and polyketides, two of the dominant classes of bioactive compounds. Results The developed pipeline correctly predicts 72.8% of the metabolic reactions in a detailed evaluation of 8 different BGCs comprising 228 functional domains. By introducing the reconstructed pathways into a genome-scale metabolic model we demonstrate that this level of accuracy is sufficient to make reliable in silico predictions with respect to production rate and gene knockout targets. Furthermore, we apply the pipeline to a large BGC database and reconstruct 943 metabolic pathways. We identify 17 enzymatic reactions using high-throughput assessment of potential knockout targets for increasing the production of any of the associated compounds. However, the targets only provide a relative increase of up to 6% compared to wild-type production rates. Conclusion With this pipeline we pave the way for an extended use of genome-scale metabolic models in strain design of heterologous expression hosts. In this context, we identified generic knockout targets for the increased production of heterologous compounds. However, as the predicted increase is minor for any of the single-reaction knockout targets, these results indicate that more sophisticated strain-engineering strategies are necessary for the development of efficient BGC expression hosts.

Download Full-text

The ModelSEED Biochemistry Database for the integration of metabolic annotations and the reconstruction, comparison and analysis of metabolic models for plants, fungi and microbes

Nucleic Acids Research ◽

10.1093/nar/gkaa746 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D575-D588

Author(s):

Samuel M D Seaver ◽

Filipe Liu ◽

Qizhi Zhang ◽

James Jeffryes ◽

José P Faria ◽

...

Keyword(s):

Metabolic Pathways ◽

Draft Genome ◽

Biochemical Network ◽

Plant Genomes ◽

Flux Balance ◽

Rosetta Stone ◽

Balance Analysis ◽

Comparison And Analysis ◽

Metabolic Models ◽

Genome Scale

Abstract For over 10 years, ModelSEED has been a primary resource for the construction of draft genome-scale metabolic models based on annotated microbial or plant genomes. Now being released, the biochemistry database serves as the foundation of biochemical data underlying ModelSEED and KBase. The biochemistry database embodies several properties that, taken together, distinguish it from other published biochemistry resources by: (i) including compartmentalization, transport reactions, charged molecules and proton balancing on reactions; (ii) being extensible by the user community, with all data stored in GitHub; and (iii) design as a biochemical ‘Rosetta Stone’ to facilitate comparison and integration of annotations from many different tools and databases. The database was constructed by combining chemical data from many resources, applying standard transformations, identifying redundancies and computing thermodynamic properties. The ModelSEED biochemistry is continually tested using flux balance analysis to ensure the biochemical network is modeling-ready and capable of simulating diverse phenotypes. Ontologies can be designed to aid in comparing and reconciling metabolic reconstructions that differ in how they represent various metabolic pathways. ModelSEED now includes 33,978 compounds and 36,645 reactions, available as a set of extensible files on GitHub, and available to search at https://modelseed.org/biochem and KBase.

Download Full-text

Automatic reconstruction of metabolic pathways from identified biosynthetic gene clusters

10.1101/2020.11.24.395400 ◽

2020 ◽

Author(s):

Snorre Sulheim ◽

Fredrik A. Fossheim ◽

Alexander Wentzel ◽

Eivind Almaas

Keyword(s):

Heterologous Expression ◽

Bioactive Compounds ◽

Metabolic Pathways ◽

Gene Clusters ◽

Strain Engineering ◽

Biosynthetic Gene ◽

Biosynthetic Gene Clusters ◽

Wide Range ◽

Metabolic Models ◽

Genome Scale

AbstractBackgroundA wide range of bioactive compounds are produced by enzymes and enzymatic complexes encoded in biosynthetic gene clusters (BGCs). These BGCs can be identified and functionally annotated based on their DNA sequence. Candidates for further research and development may be prioritized based on properties such as their functional annotation, (dis)similarity to known BGCs, and bioactivity assays. Production of the target compound in the native strain is often not achievable, rendering heterologous expression in an optimized host strain as a promising alternative. Genome-scale metabolic models are frequently used to guide strain development, but large-scale incorporation and testing of heterologous production of complex natural products in this framework is hampered by the amount of manual work required to translate annotated BGCs to metabolic pathways. To this end, we have developed a pipeline for an automated reconstruction of BGC associated metabolic pathways responsible for the synthesis of non-ribosomal peptides and polyketides, two of the dominant classes of bioactive compounds.ResultsThe developed pipeline correctly predicts 72.8% of the metabolic reactions in a detailed evaluation of 8 different BGCs comprising 228 functional domains. By introducing the reconstructed pathways into a genome-scale metabolic model we demonstrate that this level of accuracy is sufficient to make reliable in silico predictions with respect to production rate and gene knockout targets. Furthermore, we apply the pipeline to a large BGC database and reconstruct 943 metabolic pathways. We identify 17 enzymatic reactions using high-throughput assessment of potential knockout targets for increasing the production of any of the associated compounds. However, the targets only provide a relative increase of up to 6% compared to wild-type production rates.ConclusionsWith this pipeline we pave the way for an extended use of genome-scale metabolic models in strain design of heterologous expression hosts. In this context, we identified generic knockout targets for the increased production of heterologous compounds. However, as the predicted increase is minor for any of the single-reaction knockout targets, these results indicate that more sophisticated strain-engineering strategies are necessary for the development of efficient BGC expression hosts.

Download Full-text