scholarly journals Prediction of protein group function by iterative classification on functional relevance network

2018 ◽  
Vol 35 (8) ◽  
pp. 1388-1394 ◽  
Author(s):  
Ishita K Khan ◽  
Aashish Jain ◽  
Reda Rawi ◽  
Halima Bensmail ◽  
Daisuke Kihara

Abstract Motivation Biological experiments including proteomics and transcriptomics approaches often reveal sets of proteins that are most likely to be involved in a disease/disorder. To understand the functional nature of a set of proteins, it is important to capture the function of the proteins as a group, even in cases where function of individual proteins is not known. In this work, we propose a model that takes groups of proteins found to work together in a certain biological context, integrates them into functional relevance networks, and subsequently employs an iterative inference on graphical models to identify group functions of the proteins, which are then extended to predict function of individual proteins. Results The proposed algorithm, iterative group function prediction (iGFP), depicts proteins as a graph that represents functional relevance of proteins considering their known functional, proteomics and transcriptional features. Proteins in the graph will be clustered into groups by their mutual functional relevance, which is iteratively updated using a probabilistic graphical model, the conditional random field. iGFP showed robust accuracy even when substantial amount of GO annotations were missing. The perspective of ‘group’ function annotation opens up novel approaches for understanding functional nature of proteins in biological systems. Availability and implementation: http://kiharalab.org/iGFP/ Supplementary information Supplementary data are available at Bioinformatics online.

Biometrika ◽  
2020 ◽  
Author(s):  
S Na ◽  
M Kolar ◽  
O Koyejo

Abstract Differential graphical models are designed to represent the difference between the conditional dependence structures of two groups, thus are of particular interest for scientific investigation. Motivated by modern applications, this manuscript considers an extended setting where each group is generated by a latent variable Gaussian graphical model. Due to the existence of latent factors, the differential network is decomposed into sparse and low-rank components, both of which are symmetric indefinite matrices. We estimate these two components simultaneously using a two-stage procedure: (i) an initialization stage, which computes a simple, consistent estimator, and (ii) a convergence stage, implemented using a projected alternating gradient descent algorithm applied to a nonconvex objective, initialized using the output of the first stage. We prove that given the initialization, the estimator converges linearly with a nontrivial, minimax optimal statistical error. Experiments on synthetic and real data illustrate that the proposed nonconvex procedure outperforms existing methods.


Author(s):  
Zachary D. Kurtz ◽  
Richard Bonneau ◽  
Christian L. Müller

AbstractDetecting community-wide statistical relationships from targeted amplicon-based and metagenomic profiling of microbes in their natural environment is an important step toward understanding the organization and function of these communities. We present a robust and computationally tractable latent graphical model inference scheme that allows simultaneous identification of parsimonious statistical relationships among microbial species and unobserved factors that influence the prevalence and variability of the abundance measurements. Our method comes with theoretical performance guarantees and is available within the SParse InversE Covariance estimation for Ecological ASsociation Inference (SPIEC-EASI) framework (‘SpiecEasi’ R-package). Using simulations, as well as a comprehensive collection of amplicon-based gut microbiome datasets, we illustrate the method’s ability to jointly identify compositional biases, latent factors that correlate with observed technical covariates, and robust statistical microbial associations that replicate across different gut microbial data sets.


2018 ◽  
Vol 373 (1758) ◽  
pp. 20170377 ◽  
Author(s):  
Hexuan Liu ◽  
Jimin Kim ◽  
Eli Shlizerman

We propose an approach to represent neuronal network dynamics as a probabilistic graphical model (PGM). To construct the PGM, we collect time series of neuronal responses produced by the neuronal network and use singular value decomposition to obtain a low-dimensional projection of the time-series data. We then extract dominant patterns from the projections to get pairwise dependency information and create a graphical model for the full network. The outcome model is a functional connectome that captures how stimuli propagate through the network and thus represents causal dependencies between neurons and stimuli. We apply our methodology to a model of the Caenorhabditis elegans somatic nervous system to validate and show an example of our approach. The structure and dynamics of the C. elegans nervous system are well studied and a model that generates neuronal responses is available. The resulting PGM enables us to obtain and verify underlying neuronal pathways for known behavioural scenarios and detect possible pathways for novel scenarios. This article is part of a discussion meeting issue ‘Connectome to behaviour: modelling C. elegans at cellular resolution’.


2019 ◽  
Vol 35 (17) ◽  
pp. 3184-3186
Author(s):  
Xiao-Fei Zhang ◽  
Le Ou-Yang ◽  
Shuo Yang ◽  
Xiaohua Hu ◽  
Hong Yan

Abstract Summary To identify biological network rewiring under different conditions, we develop a user-friendly R package, named DiffNetFDR, to implement two methods developed for testing the difference in different Gaussian graphical models. Compared to existing tools, our methods have the following features: (i) they are based on Gaussian graphical models which can capture the changes of conditional dependencies; (ii) they determine the tuning parameters in a data-driven manner; (iii) they take a multiple testing procedure to control the overall false discovery rate; and (iv) our approach defines the differential network based on partial correlation coefficients so that the spurious differential edges caused by the variants of conditional variances can be excluded. We also develop a Shiny application to provide easier analysis and visualization. Simulation studies are conducted to evaluate the performance of our methods. We also apply our methods to two real gene expression datasets. The effectiveness of our methods is validated by the biological significance of the identified differential networks. Availability and implementation R package and Shiny app are available at https://github.com/Zhangxf-ccnu/DiffNetFDR. Supplementary information Supplementary data are available at Bioinformatics online.


2003 ◽  
Vol 6 (3) ◽  
pp. 201-211 ◽  
Author(s):  
INGRID K. CHRISTOFFELS ◽  
ANNETTE M. B. DE GROOT ◽  
LOURENS J. WALDORP

Simultaneous interpreting (SI) is a complex skill, where language comprehension and production take place at the same time in two different languages. In this study we identified some of the basic cognitive skills involved in SI, focusing on the roles of memory and lexical retrieval. We administered a reading span task in two languages and a verbal digit span task in the native language to assess memory capacity, and a picture naming and a word translation task to tap the retrieval time of lexical items in two languages, and we related performance on these four tasks to interpreting skill in untrained bilinguals. The results showed that word translation and picture naming latencies correlate with interpreting performance. Also digit span and reading span were associated with SI performance, only less strongly so. A graphical models analysis indicated that specifically word translation efficiency and working memory form independent subskills of SI performance in untrained bilinguals.


2009 ◽  
Vol 21 (11) ◽  
pp. 3010-3056 ◽  
Author(s):  
Shai Litvak ◽  
Shimon Ullman

In this letter, we develop and simulate a large-scale network of spiking neurons that approximates the inference computations performed by graphical models. Unlike previous related schemes, which used sum and product operations in either the log or linear domains, the current model uses an inference scheme based on the sum and maximization operations in the log domain. Simulations show that using these operations, a large-scale circuit, which combines populations of spiking neurons as basic building blocks, is capable of finding close approximations to the full mathematical computations performed by graphical models within a few hundred milliseconds. The circuit is general in the sense that it can be wired for any graph structure, it supports multistate variables, and it uses standard leaky integrate-and-fire neuronal units. Following previous work, which proposed relations between graphical models and the large-scale cortical anatomy, we focus on the cortical microcircuitry and propose how anatomical and physiological aspects of the local circuitry may map onto elements of the graphical model implementation. We discuss in particular the roles of three major types of inhibitory neurons (small fast-spiking basket cells, large layer 2/3 basket cells, and double-bouquet neurons), subpopulations of strongly interconnected neurons with their unique connectivity patterns in different cortical layers, and the possible role of minicolumns in the realization of the population-based maximum operation.


1997 ◽  
Vol 9 (2) ◽  
pp. 227-269 ◽  
Author(s):  
Padhraic Smyth ◽  
David Heckerman ◽  
Michael I. Jordan

Graphical techniques for modeling the dependencies of random variables have been explored in a variety of different areas, including statistics, statistical physics, artificial intelligence, speech recognition, image processing, and genetics. Formalisms for manipulating these models have been developed relatively independently in these research communities. In this paper we explore hidden Markov models (HMMs) and related structures within the general framework of probabilistic independence networks (PINs). The paper presents a self-contained review of the basic principles of PINs. It is shown that the well-known forward-backward (F-B) and Viterbi algorithms for HMMs are special cases of more general inference algorithms for arbitrary PINs. Furthermore, the existence of inference and estimation algorithms for more general graphical models provides a set of analysis tools for HMM practitioners who wish to explore a richer class of HMM structures. Examples of relatively complex models to handle sensor fusion and coarticulation in speech recognition are introduced and treated within the graphical model framework to illustrate the advantages of the general approach.


2008 ◽  
Vol 33 ◽  
pp. 465-519 ◽  
Author(s):  
R. Mateescu ◽  
R. Dechter ◽  
R. Marinescu

Inspired by the recently introduced framework of AND/OR search spaces for graphical models, we propose to augment Multi-Valued Decision Diagrams (MDD) with AND nodes, in order to capture function decomposition structure and to extend these compiled data structures to general weighted graphical models (e.g., probabilistic models). We present the AND/OR Multi-Valued Decision Diagram (AOMDD) which compiles a graphical model into a canonical form that supports polynomial (e.g., solution counting, belief updating) or constant time (e.g. equivalence of graphical models) queries. We provide two algorithms for compiling the AOMDD of a graphical model. The first is search-based, and works by applying reduction rules to the trace of the memory intensive AND/OR search algorithm. The second is inference-based and uses a Bucket Elimination schedule to combine the AOMDDs of the input functions via the the APPLY operator. For both algorithms, the compilation time and the size of the AOMDD are, in the worst case, exponential in the treewidth of the graphical model, rather than pathwidth as is known for ordered binary decision diagrams (OBDDs). We introduce the concept of semantic treewidth, which helps explain why the size of a decision diagram is often much smaller than the worst case bound. We provide an experimental evaluation that demonstrates the potential of AOMDDs.


Author(s):  
Jaewook Jung ◽  
Leihan Chen ◽  
Gunho Sohn ◽  
Chao Luo ◽  
Jong-Un Won

Railway has been used as one of the most crucial means of transportation in public mobility and economic development. For efficiently operating railways, the electrification system in railway infrastructure, which supplies electric power to trains, is essential facilities for stable train operation. Due to its important role, the electrification system needs to be rigorously and regularly inspected and managed. This paper presents a supervised learning method to classify Mobile Laser Scanning (MLS) data into ten target classes representing overhead wires, movable brackets and poles, which are recognized key objects in the electrification system. In general, the layout of railway electrification system shows a strong regularity of spatial relations among object classes. The proposed classifier is developed based on Conditional Random Field (CRF), which characterizes not only labeling homogeneity at short range, but also the layout compatibility between different object classes at long range in the probabilistic graphical model. This multi-range CRF model consists of a unary term and three pairwise contextual terms. In order to gain computational efficiency, MLS point clouds is converted into a set of line segments where the labeling process is applied. Support Vector Machine (SVM) is used as a local classifier considering only node features for producing the unary potentials of CRF model. As the short-range pairwise contextual term, Potts model is applied to enforce a local smoothness in short-range graph. While, long-range pairwise potentials are designed to enhance spatial regularities of both horizontal and vertical layouts among railway objects. We formulate two long-range pairwise potentials as the log posterior probability obtained by Naïve Bayes classifier. The directional layout compatibilities are characterized in probability look-up tables which represent co-occurrence rate of spatial relations in horizontal and vertical directions. The likelihood function is formulated by multivariate Gaussian distributions. In the proposed multi-range CRF model, the weight parameters to balance four sub-terms are estimated by applying the Stochastic Gradient Descent (SGD). The results show that the proposed multi-range CRF can effectively classify detailed railway elements, representing the average recall of 97.66% and the average precision of 97.07% for all classes.


2013 ◽  
Vol 20 (4) ◽  
pp. 591-600 ◽  
Author(s):  
Guofeng Wang ◽  
Xiaoliang Feng ◽  
Chang Liu

Condition monitoring of rolling element bearing is paramount for predicting the lifetime and performing effective maintenance of the mechanical equipment. To overcome the drawbacks of the hidden Markov model (HMM) and improve the diagnosis accuracy, conditional random field (CRF) model based classifier is proposed. In this model, the feature vectors sequences and the fault categories are linked by an undirected graphical model in which their relationship is represented by a global conditional probability distribution. In comparison with the HMM, the main advantage of the CRF model is that it can depict the temporal dynamic information between the observation sequences and state sequences without assuming the independence of the input feature vectors. Therefore, the interrelationship between the adjacent observation vectors can also be depicted and integrated into the model, which makes the classifier more robust and accurate than the HMM. To evaluate the effectiveness of the proposed method, four kinds of bearing vibration signals which correspond to normal, inner race pit, outer race pit and roller pit respectively are collected from the test rig. And the CRF and HMM models are built respectively to perform fault classification by taking the sub band energy features of wavelet packet decomposition (WPD) as the observation sequences. Moreover, K-fold cross validation method is adopted to improve the evaluation accuracy of the classifier. The analysis and comparison under different fold times show that the accuracy rate of classification using the CRF model is higher than the HMM. This method brings some new lights on the accurate classification of the bearing faults.


Sign in / Sign up

Export Citation Format

Share Document