scholarly journals Platform for efficient large-scale storage and analysis of multi-omics data in plant and microbial systems. Final Technical Report

2020 ◽  
Author(s):  
Michael Wrinn
2021 ◽  
Vol 20 (1) ◽  
Author(s):  
Jingru Zhou ◽  
Yingping Zhuang ◽  
Jianye Xia

Abstract Background Genome-scale metabolic model (GSMM) is a powerful tool for the study of cellular metabolic characteristics. With the development of multi-omics measurement techniques in recent years, new methods that integrating multi-omics data into the GSMM show promising effects on the predicted results. It does not only improve the accuracy of phenotype prediction but also enhances the reliability of the model for simulating complex biochemical phenomena, which can promote theoretical breakthroughs for specific gene target identification or better understanding the cell metabolism on the system level. Results Based on the basic GSMM model iHL1210 of Aspergillus niger, we integrated large-scale enzyme kinetics and proteomics data to establish a GSMM based on enzyme constraints, termed a GEM with Enzymatic Constraints using Kinetic and Omics data (GECKO). The results show that enzyme constraints effectively improve the model’s phenotype prediction ability, and extended the model’s potential to guide target gene identification through predicting metabolic phenotype changes of A. niger by simulating gene knockout. In addition, enzyme constraints significantly reduced the solution space of the model, i.e., flux variability over 40.10% metabolic reactions were significantly reduced. The new model showed also versatility in other aspects, like estimating large-scale $$k_{{cat}}$$ k cat values, predicting the differential expression of enzymes under different growth conditions. Conclusions This study shows that incorporating enzymes’ abundance information into GSMM is very effective for improving model performance with A. niger. Enzyme-constrained model can be used as a powerful tool for predicting the metabolic phenotype of A. niger by incorporating proteome data. In the foreseeable future, with the fast development of measurement techniques, and more precise and rich proteomics quantitative data being obtained for A. niger, the enzyme-constrained GSMM model will show greater application space on the system level.


2017 ◽  
Vol 26 (01) ◽  
pp. 188-192 ◽  
Author(s):  
H. Dauchel ◽  
T. Lecroq

Summary Objective: To summarize excellent current research and propose a selection of best papers published in 2016 in the field of Bioinformatics and Translational Informatics with applications in the health domain and clinical care. Methods: We provide a synopsis of the articles selected for the IMIA Yearbook 2017, from which we attempt to derive a synthetic overview of current and future activities in the field. As in 2016, a first step of selection was performed by querying MEDLINE with a list of MeSH descriptors completed by a list of terms adapted to the section coverage. Each section editor evaluated separately the set of 951 articles returned and evaluation results were merged for retaining 15 candidate best papers for peer-review. Results: The selection and evaluation process of papers published in the Bioinformatics and Translational Informatics field yielded four excellent articles focusing this year on the secondary use and massive integration of multi-omics data for cancer genomics and non-cancer complex diseases. Papers present methods to study the functional impact of genetic variations, either at the level of the transcription or at the levels of pathway and network. Conclusions: Current research activities in Bioinformatics and Translational Informatics with applications in the health domain continue to explore new algorithms and statistical models to manage, integrate, and interpret large-scale genomic datasets. As addressed by some of the selected papers, future trends would include the question of the international collaborative sharing of clinical and omics data, and the implementation of intelligent systems to enhance routine medical genomics.


Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 238 ◽  
Author(s):  
Evangelina López de Maturana ◽  
Lola Alonso ◽  
Pablo Alarcón ◽  
Isabel Adoración Martín-Antoniano ◽  
Silvia Pineda ◽  
...  

Omics data integration is already a reality. However, few omics-based algorithms show enough predictive ability to be implemented into clinics or public health domains. Clinical/epidemiological data tend to explain most of the variation of health-related traits, and its joint modeling with omics data is crucial to increase the algorithm’s predictive ability. Only a small number of published studies performed a “real” integration of omics and non-omics (OnO) data, mainly to predict cancer outcomes. Challenges in OnO data integration regard the nature and heterogeneity of non-omics data, the possibility of integrating large-scale non-omics data with high-throughput omics data, the relationship between OnO data (i.e., ascertainment bias), the presence of interactions, the fairness of the models, and the presence of subphenotypes. These challenges demand the development and application of new analysis strategies to integrate OnO data. In this contribution we discuss different attempts of OnO data integration in clinical and epidemiological studies. Most of the reviewed papers considered only one type of omics data set, mainly RNA expression data. All selected papers incorporated non-omics data in a low-dimensionality fashion. The integrative strategies used in the identified papers adopted three modeling methods: Independent, conditional, and joint modeling. This review presents, discusses, and proposes integrative analytical strategies towards OnO data integration.


Genes ◽  
2019 ◽  
Vol 10 (3) ◽  
pp. 240 ◽  
Author(s):  
Gangcai Xie ◽  
Chengliang Dong ◽  
Yinfei Kong ◽  
Jiang Zhong ◽  
Mingyao Li ◽  
...  

Accurate prognosis of patients with cancer is important for the stratification of patients, the optimization of treatment strategies, and the design of clinical trials. Both clinical features and molecular data can be used for this purpose, for instance, to predict the survival of patients censored at specific time points. Multi-omics data, including genome-wide gene expression, methylation, protein expression, copy number alteration, and somatic mutation data, are becoming increasingly common in cancer studies. To harness the rich information in multi-omics data, we developed GDP (Group lass regularized Deep learning for cancer Prognosis), a computational tool for survival prediction using both clinical and multi-omics data. GDP integrated a deep learning framework and Cox proportional hazard model (CPH) together, and applied group lasso regularization to incorporate gene-level group prior knowledge into the model training process. We evaluated its performance in both simulated and real data from The Cancer Genome Atlas (TCGA) project. In simulated data, our results supported the importance of group prior information in the regularization of the model. Compared to the standard lasso regularization, we showed that group lasso achieved higher prediction accuracy when the group prior knowledge was provided. We also found that GDP performed better than CPH for complex survival data. Furthermore, analysis on real data demonstrated that GDP performed favorably against other methods in several cancers with large-scale omics data sets, such as glioblastoma multiforme, kidney renal clear cell carcinoma, and bladder urothelial carcinoma. In summary, we demonstrated that GDP is a powerful tool for prognosis of patients with cancer, especially when large-scale molecular features are available.


mSystems ◽  
2019 ◽  
Vol 4 (3) ◽  
Author(s):  
Ryan S. McClure

ABSTRACT Within the last decade, there has been an explosion of multi-omics data generated for several microbial systems. At the same time, new methods of analysis have emerged that are based on inferring networks that link features both within and between species based on correlation in abundance. These developments prompt two important questions. What can be done with network approaches to better understand microbial species interactions? What challenges remain in applying network approaches to query the more complex systems of natural settings? Here, I briefly describe what has been done and what questions still need to be answered. Over the next 5 to 10 years, we will be in a strong position to infer networks that contain multiple kinds of omic data and describe systems with multiple species. These applications will open the door for a better understanding and use of microbiomes across a variety of fields.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S11) ◽  
Author(s):  
Tianle Ma ◽  
Aidong Zhang

Abstract Background Comprehensive molecular profiling of various cancers and other diseases has generated vast amounts of multi-omics data. Each type of -omics data corresponds to one feature space, such as gene expression, miRNA expression, DNA methylation, etc. Integrating multi-omics data can link different layers of molecular feature spaces and is crucial to elucidate molecular pathways underlying various diseases. Machine learning approaches to mining multi-omics data hold great promises in uncovering intricate relationships among molecular features. However, due to the “big p, small n” problem (i.e., small sample sizes with high-dimensional features), training a large-scale generalizable deep learning model with multi-omics data alone is very challenging. Results We developed a method called Multi-view Factorization AutoEncoder (MAE) with network constraints that can seamlessly integrate multi-omics data and domain knowledge such as molecular interaction networks. Our method learns feature and patient embeddings simultaneously with deep representation learning. Both feature representations and patient representations are subject to certain constraints specified as regularization terms in the training objective. By incorporating domain knowledge into the training objective, we implicitly introduced a good inductive bias into the machine learning model, which helps improve model generalizability. We performed extensive experiments on the TCGA datasets and demonstrated the power of integrating multi-omics data and biological interaction networks using our proposed method for predicting target clinical variables. Conclusions To alleviate the overfitting problem in deep learning on multi-omics data with the “big p, small n” problem, it is helpful to incorporate biological domain knowledge into the model as inductive biases. It is very promising to design machine learning models that facilitate the seamless integration of large-scale multi-omics data and biomedical domain knowledge for uncovering intricate relationships among molecular features and clinical features.


PLoS ONE ◽  
2015 ◽  
Vol 10 (5) ◽  
pp. e0126967 ◽  
Author(s):  
Takeru Uchiyama ◽  
Mitsuru Irie ◽  
Hiroshi Mori ◽  
Ken Kurokawa ◽  
Takuji Yamada

Sign in / Sign up

Export Citation Format

Share Document