Distributed Bayesian Networks Reconstruction on the Whole Genome Scale

Mapping Intimacies ◽

10.1101/016683 ◽

2015 ◽

Author(s):

Alina Frolova ◽

Bartek Wilczynski

Keyword(s):

Experimental Data ◽

Bayesian Networks ◽

Graphical Models ◽

Polynomial Time ◽

Protein Interactions ◽

Regulatory Networks ◽

Large Scale ◽

External Information ◽

Whole Genome ◽

Wide Audience

AbstractBackgroundBayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein-protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly.ResultsIn the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder - tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced.ConclusionsWe show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets.

Download Full-text

Distributed Bayesian networks reconstruction on the whole genome scale

PeerJ ◽

10.7717/peerj.5692 ◽

2018 ◽

Vol 6 ◽

pp. e5692 ◽

Cited By ~ 2

Author(s):

Alina Frolova ◽

Bartek Wilczyński

Keyword(s):

Experimental Data ◽

Bayesian Networks ◽

Graphical Models ◽

Polynomial Time ◽

Protein Interactions ◽

Regulatory Networks ◽

Large Scale ◽

External Information ◽

Whole Genome ◽

Wide Audience

Background Bayesian networks are directed acyclic graphical models widely used to represent the probabilistic relationships between random variables. They have been applied in various biological contexts, including gene regulatory networks and protein–protein interactions inference. Generally, learning Bayesian networks from experimental data is NP-hard, leading to widespread use of heuristic search methods giving suboptimal results. However, in cases when the acyclicity of the graph can be externally ensured, it is possible to find the optimal network in polynomial time. While our previously developed tool BNFinder implements polynomial time algorithm, reconstructing networks with the large amount of experimental data still leads to computations on single CPU growing exceedingly. Results In the present paper we propose parallelized algorithm designed for multi-core and distributed systems and its implementation in the improved version of BNFinder—tool for learning optimal Bayesian networks. The new algorithm has been tested on different simulated and experimental datasets showing that it has much better efficiency of parallelization than the previous version. BNFinder gives comparable results in terms of accuracy with respect to current state-of-the-art inference methods, giving significant advantage in cases when external information such as regulators list or prior edge probability can be introduced, particularly for datasets with static gene expression observations. Conclusions We show that the new method can be used to reconstruct networks in the size range of thousands of genes making it practically applicable to whole genome datasets of prokaryotic systems and large components of eukaryotic genomes. Our benchmarking results on realistic datasets indicate that the tool should be useful to a wide audience of researchers interested in discovering dependencies in their large-scale transcriptomic datasets.

Download Full-text

Combining Bayesian Approaches and Evolutionary Techniques for the Inference of Breast Cancer Networks

10.1101/115261 ◽

2017 ◽

Author(s):

Stefano Beretta ◽

Mauro Castelli ◽

Ivo Gonçalves ◽

Ivan Merelli ◽

Daniele Ramazzotti

Keyword(s):

Breast Cancer ◽

Graphical Models ◽

Protein Interactions ◽

Large Scale ◽

Small Sample Size ◽

Small Sample ◽

Cancer Data ◽

Correlation Networks ◽

Model Complex ◽

Cancer Networks

AbstractGene and protein networks are very important to model complex large-scale systems in molecular biology. Inferring or reverseengineering such networks can be defined as the process of identifying gene/protein interactions from experimental data through computational analysis. However, this task is typically complicated by the enormously large scale of the unknowns in a rather small sample size. Furthermore, when the goal is to study causal relationships within the network, tools capable of overcoming the limitations of correlation networks are required. In this work, we make use of Bayesian Graphical Models to attach this problem and, specifically, we perform a comparative study of different state-of-the-art heuristics, analyzing their performance in inferring the structure of the Bayesian Network from breast cancer data.

Download Full-text

Modeling Large-Scale Gene Regulatory Networks using Gene Ontology-Based Clustering and Dynamic Bayesian Networks

2008 2nd International Conference on Bioinformatics and Biomedical Engineering ◽

10.1109/icbbe.2008.76 ◽

2008 ◽

Cited By ~ 6

Author(s):

F. Yavari ◽

F. Towhidkhah ◽

S. Gharibzadeh ◽

A. R. Khanteymoori ◽

M. M. Homayounpour

Keyword(s):

Gene Ontology ◽

Bayesian Networks ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Large Scale ◽

Dynamic Bayesian Networks ◽

Gene Regulatory

Download Full-text

The impact of whole genome duplications on the human gene regulatory networks

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009638 ◽

2021 ◽

Vol 17 (12) ◽

pp. e1009638

Author(s):

Francesco Mottes ◽

Chiara Villa ◽

Matteo Osella ◽

Michele Caselle

Keyword(s):

Gene Regulatory Networks ◽

Protein Interactions ◽

Regulatory Networks ◽

Human Gene ◽

Small Scale ◽

Whole Genome ◽

Mirna Regulation ◽

Vertebrate Lineage ◽

Gene Regulatory ◽

The Impact

This work studies the effects of the two rounds of Whole Genome Duplication (WGD) at the origin of the vertebrate lineage on the architecture of the human gene regulatory networks. We integrate information on transcriptional regulation, miRNA regulation, and protein-protein interactions to comparatively analyse the role of WGD and Small Scale Duplications (SSD) in the structural properties of the resulting multilayer network. We show that complex network motifs, such as combinations of feed-forward loops and bifan arrays, deriving from WGD events are specifically enriched in the network. Pairs of WGD-derived proteins display a strong tendency to interact both with each other and with common partners and WGD-derived transcription factors play a prominent role in the retention of a strong regulatory redundancy. Combinatorial regulation and synergy between different regulatory layers are in general enhanced by duplication events, but the two types of duplications contribute in different ways. Overall, our findings suggest that the two WGD events played a substantial role in increasing the multi-layer complexity of the vertebrate regulatory network by enhancing its combinatorial organization, with potential consequences on its overall robustness and ability to perform high-level functions like signal integration and noise control. Lastly, we discuss in detail the RAR/RXR pathway as an illustrative example of the evolutionary impact of WGD duplications in human.

Download Full-text

The impact of whole genome duplications on the human gene regulatory networks

10.1101/2021.07.16.452729 ◽

2021 ◽

Author(s):

Francesco Mottes ◽

Chiara Villa ◽

Matteo Osella ◽

Michele Caselle

Keyword(s):

Gene Regulatory Networks ◽

Protein Interactions ◽

Regulatory Networks ◽

Human Gene ◽

Small Scale ◽

Whole Genome ◽

Mirna Regulation ◽

Vertebrate Lineage ◽

Gene Regulatory ◽

The Impact

Download Full-text

Unravelling Nature's Networks: From Microarray and Proteomic Analysis to Systems Biology: University of Sheffield, 21–22 July 2003

The Biochemist ◽

10.1042/bio02506040 ◽

2003 ◽

Vol 25 (6) ◽

pp. 40-41

Author(s):

Nick Monk ◽

Neil Lawrence

Keyword(s):

Gene Expression ◽

Experimental Data ◽

Mathematical Model ◽

Protein Interactions ◽

Large Scale ◽

Dynamic Behaviour ◽

Biochemical Networks ◽

Data Sets ◽

Variable Quality

The robust and adaptable behaviours of cells and tissues depend on the operation of complex regulatory biochemical networks. The elucidation of the structure and functioning of such networks poses many daunting challenges. Recently developed experimental techniques, such as large-scale profiling of gene expression and protein interactions, provide unprecedented amounts of information on the molecular composition of cells. The size (and often variable quality) of the resulting data sets necessitates the use of sophisticated computational schemes for the analysis, mining and integration of the data. In all but the simplest cases, the complexity of the networks is such that it is impossible to provide an intuitive picture of the principles governing their dynamic behaviour without synthesizing the experimental data into a coherent mathematical model of the underlying system.

Download Full-text

Single Layers of Attention Suffice to Predict Protein Contacts

10.1101/2020.12.21.423882 ◽

2020 ◽

Author(s):

Nicholas Bhattacharya ◽

Neil Thomas ◽

Roshan Rao ◽

Justas Dauparas ◽

Peter K. Koo ◽

...

Keyword(s):

Graphical Models ◽

Protein Interactions ◽

Large Scale ◽

3D Structure ◽

Single Layer ◽

Representation Learning ◽

Protein Family ◽

The Other ◽

Multiple Sequence ◽

Contact Prediction

AbstractThe established approach to unsupervised protein contact prediction estimates co-evolving positions using undirected graphical models. This approach trains a Potts model on a Multiple Sequence Alignment, then predicts that the edges with highest weight correspond to contacts in the 3D structure. On the other hand, increasingly large Transformers are being pretrained on protein sequence databases but have demonstrated mixed results for downstream tasks, including contact prediction. This has sparked discussion about the role of scale and attention-based models in unsupervised protein representation learning. We argue that attention is a principled model of protein interactions, grounded in real properties of protein family data. We introduce a simplified attention layer, factored attention, and show that it achieves comparable performance to Potts models, while sharing parameters both within and across families. Further, we extract contacts from the attention maps of a pretrained Transformer and show they perform competitively with the other two approaches. This provides evidence that large-scale pretraining can learn meaningful protein features when presented with unlabeled and unaligned data. We contrast factored attention with the Transformer to indicate that the Transformer leverages hierarchical signal in protein family databases not captured by our single-layer models. This raises the exciting possibility for the development of powerful structured models of protein family databases.1

Download Full-text

Large-scale data analysis for robotic yeast one-hybrid platforms and multi-disciplinary studies using GateMultiplex

BMC Biology ◽

10.1186/s12915-021-01140-y ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Ni-Chiao Tsai ◽

Tzu-Shu Hsu ◽

Shang-Che Kuo ◽

Chung-Ting Kao ◽

Tzu-Huan Hung ◽

...

Keyword(s):

Data Analysis ◽

High Throughput ◽

Protein Interactions ◽

Precision Agriculture ◽

Regulatory Networks ◽

Large Scale ◽

Cancer Drug ◽

Large Scale Data ◽

Programming Skills ◽

Scale Data

Abstract Background Yeast one-hybrid (Y1H) is a common technique for identifying DNA-protein interactions, and robotic platforms have been developed for high-throughput analyses to unravel the gene regulatory networks in many organisms. Use of these high-throughput techniques has led to the generation of increasingly large datasets, and several software packages have been developed to analyze such data. We previously established the currently most efficient Y1H system, meiosis-directed Y1H; however, the available software tools were not designed for processing the additional parameters suggested by meiosis-directed Y1H to avoid false positives and required programming skills for operation. Results We developed a new tool named GateMultiplex with high computing performance using C++. GateMultiplex incorporated a graphical user interface (GUI), which allows the operation without any programming skills. Flexible parameter options were designed for multiple experimental purposes to enable the application of GateMultiplex even beyond Y1H platforms. We further demonstrated the data analysis from other three fields using GateMultiplex, the identification of lead compounds in preclinical cancer drug discovery, the crop line selection in precision agriculture, and the ocean pollution detection from deep-sea fishery. Conclusions The user-friendly GUI, fast C++ computing speed, flexible parameter setting, and applicability of GateMultiplex facilitate the feasibility of large-scale data analysis in life science fields.

Download Full-text

0306 Exploring the feasibility of using copy number variants as genetic markers through large-scale whole genome sequencing experiments

Journal of Animal Science ◽

10.2527/jam2016-0306 ◽

2016 ◽

Vol 94 (suppl_5) ◽

pp. 146-146

Author(s):

D. M. Bickhart ◽

L. Xu ◽

J. L. Hutchison ◽

J. B. Cole ◽

D. J. Null ◽

...

Keyword(s):

Whole Genome Sequencing ◽

Genetic Markers ◽

Genome Sequencing ◽

Copy Number ◽

Large Scale ◽

Copy Number Variants ◽

Whole Genome

Download Full-text

Plasmids or no plasmids? A comparison between the agilent TapeStation and whole-genome sequencing data in a large-scale bacterial sequencing project

10.26226/morressier.56d5ba27d462b80296c95fe7 ◽

2016 ◽

Author(s):

Sarah Alexander

Keyword(s):

Whole Genome Sequencing ◽

Genome Sequencing ◽

Large Scale ◽

Whole Genome Sequencing Data ◽

Whole Genome ◽

Sequencing Data ◽

Sequencing Project

Download Full-text