scholarly journals Hierarchical Network Exploration using Gaussian Mixture Models

2019 ◽  
Author(s):  
James Mathews ◽  
Saad Nadeem ◽  
Maryam Pouryahya ◽  
Zehor Belkhatir ◽  
Joseph O. Deasy ◽  
...  

AbstractWe present a framework based on optimal mass transport to construct, for a given network, a reduction hierarchy which can be used for interactive data exploration and community detection. Given a network and a set of numerical data samples for each node, we calculate a new computationally-efficient comparison metric between Gaussian Mixture Models, the Gaussian Mixture Transport distance, to determine a series of merge simplifications of the network. If only a network is given, numerical samples are synthesized from the network topology. The method has its basis in the local connection structure of the network, as well as the joint distribution of the data associated with neighboring nodes.The analysis is benchmarked on networks with known community structures. We also analyze gene regulatory networks, including the PANTHER curated database and networks inferred from the GTEx lung and breast tissue RNA profiles. Gene Ontology annotations from the EBI GOA database are ranked and superimposed to explain the salient gene modules. We find that several gene modules related to highly specific biological processes are well-coordinated in such tissues. We also find that 18 of the 50 genes of the PAM50 breast-tumor prognostic signature appear among the highly coordinated genes in a single gene module, in both the breast and lung samples. Moreover these 18 are precisely the subset of the PAM50 recently identified as the basal-like markers.

2020 ◽  
Vol 117 (28) ◽  
pp. 16339-16345 ◽  
Author(s):  
James C. Mathews ◽  
Saad Nadeem ◽  
Maryam Pouryahya ◽  
Zehor Belkhatir ◽  
Joseph O. Deasy ◽  
...  

We present a technique to construct a simplification of a feature network which can be used for interactive data exploration, biological hypothesis generation, and the detection of communities or modules of cofunctional features. These are modules of features that are not necessarily correlated, but nevertheless exhibit common function in their network context as measured by similarity of relationships with neighboring features. In the case of genetic networks, traditional pathway analyses tend to assume that, ideally, all genes in a module exhibit very similar function, independent of relationships with other genes. The proposed technique explicitly relaxes this assumption by employing the comparison of relational profiles. For example, two genes which always activate a third gene are grouped together even if they never do so concurrently. They have common, but not identical, function. The comparison is driven by an average of a certain computationally efficient comparison metric between Gaussian mixture models. The method has its basis in the local connection structure of the network and the collection of joint distributions of the data associated with nodal neighborhoods. It is benchmarked on networks with known community structures. As the main application, we analyzed the gene regulatory network in lung adenocarcinoma, finding a cofunctional module of genes including the pregnancy-specific glycoproteins (PSGs). About 20% of patients with lung, breast, uterus, and colon cancer in The Cancer Genome Atlas (TCGA) have an elevated PSG+ signature, with associated poor group prognosis. In conjunction with previous results relating PSGs to tolerance in the immune system, these findings implicate the PSGs in a potential immune tolerance mechanism of cancers.


2019 ◽  
Vol 490 (3) ◽  
pp. 3966-3986 ◽  
Author(s):  
Daniel M Jones ◽  
Alan F Heavens

ABSTRACT Future cosmological galaxy surveys such as the Large Synoptic Survey Telescope (LSST) will photometrically observe very large numbers of galaxies. Without spectroscopy, the redshifts required for the analysis of these data will need to be inferred using photometric redshift techniques that are scalable to large sample sizes. The high number density of sources will also mean that around half are blended. We present a Bayesian photometric redshift method for blended sources that uses Gaussian mixture models to learn the joint flux–redshift distribution from a set of unblended training galaxies, and Bayesian model comparison to infer the number of galaxies comprising a blended source. The use of Gaussian mixture models renders both of these applications computationally efficient and therefore suitable for upcoming galaxy surveys.


2017 ◽  
Vol 34 (10) ◽  
pp. 1399-1414 ◽  
Author(s):  
Wanxia Deng ◽  
Huanxin Zou ◽  
Fang Guo ◽  
Lin Lei ◽  
Shilin Zhou ◽  
...  

2013 ◽  
Vol 141 (6) ◽  
pp. 1737-1760 ◽  
Author(s):  
Thomas Sondergaard ◽  
Pierre F. J. Lermusiaux

Abstract This work introduces and derives an efficient, data-driven assimilation scheme, focused on a time-dependent stochastic subspace that respects nonlinear dynamics and captures non-Gaussian statistics as it occurs. The motivation is to obtain a filter that is applicable to realistic geophysical applications, but that also rigorously utilizes the governing dynamical equations with information theory and learning theory for efficient Bayesian data assimilation. Building on the foundations of classical filters, the underlying theory and algorithmic implementation of the new filter are developed and derived. The stochastic Dynamically Orthogonal (DO) field equations and their adaptive stochastic subspace are employed to predict prior probabilities for the full dynamical state, effectively approximating the Fokker–Planck equation. At assimilation times, the DO realizations are fit to semiparametric Gaussian Mixture Models (GMMs) using the Expectation-Maximization algorithm and the Bayesian Information Criterion. Bayes’s law is then efficiently carried out analytically within the evolving stochastic subspace. The resulting GMM-DO filter is illustrated in a very simple example. Variations of the GMM-DO filter are also provided along with comparisons with related schemes.


Sign in / Sign up

Export Citation Format

Share Document