Clustering gene expression time series data using an infinite Gaussian process mixture model

Mapping Intimacies ◽

10.1101/131151 ◽

2017 ◽

Cited By ~ 1

Author(s):

Ian C. McDowell ◽

Dinesh Manandhar ◽

Christopher M. Vockley ◽

Amy K. Schmid ◽

Timothy E. Reddy ◽

...

Keyword(s):

Time Series ◽

Gaussian Process ◽

Mixture Model ◽

Dirichlet Process ◽

Cellular Response ◽

Time Series Data ◽

Series Data ◽

Nonparametric Model ◽

Cluster Number ◽

Temporal Dependencies

AbstractTranscriptome-wide time series expression profiling is used to characterize the cellular response to environmental perturbations. The first step to analyzing transcriptional response data is often to cluster genes with similar responses. Here, we present a nonparametric model-based method, Dirichlet process Gaussian process mixture model (DPGP), which jointly models cluster number with a Dirichlet process and temporal dependencies with Gaussian processes. We demonstrate the accuracy of DPGP in comparison with state-of-the-art approaches using hundreds of simulated data sets. To further test our method, we apply DPGP to published microarray data from a microbial model organism exposed to stress and to novel RNA-seq data from a human cell line exposed to the glucocorticoid dexamethasone. We validate our clusters by examining local transcription factor binding and histone modifications. Our results demonstrate that jointly modeling cluster number and temporal dependencies can reveal novel regulatory mechanisms. DPGP software is freely available online at https://github.com/PrincetonUniversity/DP_GP_cluster.

Download Full-text

Discovering Key Transcriptomic Regulators in Pancreatic Ductal Adenocarcinoma using Dirichlet Process Gaussian Mixture Model

10.1101/2020.10.01.322768 ◽

2020 ◽

Author(s):

Sk Md Mosaddek Hossain ◽

Aanzil Akram Halsana ◽

Lutfunnesa Khatun ◽

Sumanta Ray ◽

Anirban Mukhopadhyay

Keyword(s):

Gene Expression ◽

Time Series ◽

Pancreatic Ductal Adenocarcinoma ◽

Mixture Model ◽

Dirichlet Process ◽

Time Series Data ◽

Gaussian Mixture ◽

Ductal Adenocarcinoma ◽

Series Data ◽

Main Text

ABSTRACTPancreatic Ductal Adenocarcinoma (PDAC) is the most lethal type of pancreatic cancer (PC), late detection of which leads to its therapeutic failure. This study aims to find out key regulatory genes and their impact on the progression of the disease helping the etiology of the disease which is still largely unknown. We leverage the landmark advantages of time-series gene expression data of this disease, and thereby the identified key regulators capture the characteristics of gene activity patterns in the progression of the cancer. We have identified the key modules and predicted gene functions of top genes from the compiled gene association network (GAN). Here, we have used the natural cubic spline regression model (splineTimeR) to identify differentially expressed genes (DEG) from the PDAC microarray time-series data downloaded from gene expression omnibus (GEO). First, we have identified key transcriptomic regulators (TR) and DNA binding transcription factors (DbTF). Subsequently, the Dirichlet process and Gaussian process (DPGP) mixture model is utilized to identify the key gene modules. A variation of the partial correlation method is utilized to analyze GAN, which is followed by a process of gene function prediction from the network. Finally, a panel of key genes related to PDAC is highlighted from each of the analyses performed.Please note: Abbreviations should be introduced at the first mention in the main text – no abbreviations lists. Suggested structure of main text (not enforced) is provided below.

Download Full-text

Clustering gene expression time series data using an infinite Gaussian process mixture model

PLoS Computational Biology ◽

10.1371/journal.pcbi.1005896 ◽

2018 ◽

Vol 14 (1) ◽

pp. e1005896 ◽

Cited By ~ 29

Author(s):

Ian C. McDowell ◽

Dinesh Manandhar ◽

Christopher M. Vockley ◽

Amy K. Schmid ◽

Timothy E. Reddy ◽

...

Keyword(s):

Gene Expression ◽

Time Series ◽

Gaussian Process ◽

Mixture Model ◽

Time Series Data ◽

Series Data ◽

Gene Expression Time Series ◽

Expression Time

Download Full-text

Estimation of Most Probable Maximum From Short-Duration or Undersampled Time-Series Data

Volume 3: Structures, Safety and Reliability ◽

10.1115/omae2015-41701 ◽

2015 ◽

Author(s):

Puneet Agarwal ◽

William Walker ◽

Kenneth Bhalla

Keyword(s):

Time Series ◽

Gaussian Process ◽

Short Duration ◽

Time Series Data ◽

Extreme Value ◽

Series Data ◽

Sampled Data ◽

Zero Crossing ◽

Short Time ◽

Undersampled Data

The most probable maximum (MPM) is the extreme value statistic commonly used in the offshore industry. The extreme value of vessel motions, structural response, and environment are often expressed using the MPM. For a Gaussian process, the MPM is a function of the root-mean square and the zero-crossing rate of the process. Accurate estimates of the MPM may be obtained in frequency domain from spectral moments of the known power spectral density. If the MPM is to be estimated from the time-series of a random process, either from measurements or from simulations, the time series data should be of long enough duration, sampled at an adequate rate, and have an ensemble of multiple realizations. This is not the case when measured data is recorded for an insufficient duration, or one wants to make decisions (requiring an estimate of the MPM) in real-time based on observing the data only for a short duration. Sometimes, the instrumentation system may not be properly designed to measure the dynamic vessel motions with a fine sampling rate, or it may be a legacy instrumentation system. The question then becomes whether the short-duration and/or the undersampled data is useful at all, or if some useful information (i.e., an estimate of MPM) can be extracted, and if yes, what is the accuracy and uncertainty of such estimates. In this paper, a procedure for estimation of the MPM from the short-time maxima, i.e., the maximum value from a time series of short duration (say, 10 or 30 minutes), is presented. For this purpose pitch data is simulated from the vessel RAOs (response amplitude operators). Factors to convert the short-time maxima to the MPM are computed for various non-exceedance levels. It is shown that the factors estimated from simulation can also be obtained from the theory of extremes of a Gaussian process. Afterwards, estimation of the MPM from the short-time maxima is explored for an undersampled process; however, undersampled data must not be used and only the adequately sampled data should be utilized. It is found that the undersampled data can be somewhat useful and factors to convert the short-time maxima to the MPM can be derived for an associated non-exceedance level. However, compared to the adequately sampled data, the factors for the undersampled data are less useful since they depend on more variables and have more uncertainty. While the vessel pitch data was the focus of this paper, the results and conclusions are valid for any adequately sampled narrow-banded Gaussian process.

Download Full-text

Sequence Pattern Extraction by Segmenting Time Series Data Using GP-HSMM with Hierarchical Dirichlet Process

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros.2018.8594029 ◽

2018 ◽

Cited By ~ 1

Author(s):

Masatoshi Nagano ◽

Tomoaki Nakamura ◽

Takayuki Nagai ◽

Daichi Mochihashi ◽

Ichiro Kobayashi ◽

...

Keyword(s):

Time Series ◽

Dirichlet Process ◽

Time Series Data ◽

Series Data ◽

Pattern Extraction ◽

Hierarchical Dirichlet Process ◽

Sequence Pattern

Download Full-text

A Gaussian process regression approach for testing Granger causality between time series data

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp.2012.6288635 ◽

2012 ◽

Cited By ~ 5

Author(s):

P. O. Amblard ◽

O. J. J. Michel ◽

C. Richard ◽

P. Honeine

Keyword(s):

Time Series ◽

Gaussian Process ◽

Granger Causality ◽

Time Series Data ◽

Gaussian Process Regression ◽

Series Data ◽

Regression Approach

Download Full-text

Visualizing State of Time-Series Data by Supervised Gaussian Process Dynamical Models

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0688 ◽

2015 ◽

Vol 19 (5) ◽

pp. 688-696

Author(s):

Nobuhiko Yamaguchi ◽

Keyword(s):

Time Series ◽

Gaussian Process ◽

Time Series Data ◽

Series Data ◽

Nonlinear Dimensionality Reduction ◽

Probabilistic Representation ◽

Dynamical Models ◽

Motion Data ◽

Dimensionality Reduction Technique ◽

Representation Of Time

Gaussian Process Dynamical Models (GPDMs) constitute a nonlinear dimensionality reduction technique that provides a probabilistic representation of time series data in terms of Gaussian process priors. In this paper, we report a method based on GPDMs to visualize the states of time-series data. Conventional GPDMs are unsupervised, and therefore, even when the labels of data are available, it is not possible to use this information. To overcome the problem, we propose a supervised GPDM (S-GPDM) that utilizes both the data and their corresponding labels. We demonstrate experimentally that the S-GPDM can locate related motion data closer together than conventional GPDMs.

Download Full-text

Finite mixture model: a comparison method for nonlinear time series data

International Journal of Computing Science and Mathematics ◽

10.1504/ijcsm.2016.10000257 ◽

2016 ◽

Vol 7 (4) ◽

pp. 381

Author(s):

Rosmanjawati Binti Abdul Rahman ◽

Seuk Wai Phoong ◽

Mohd Tahir Ismail ◽

Seuk Yen Phoong

Keyword(s):

Time Series ◽

Mixture Model ◽

Time Series Data ◽

Finite Mixture Model ◽

Nonlinear Time Series ◽

Comparison Method ◽

Finite Mixture ◽

Series Data

Download Full-text

Bayesian Non-Parametric Mixtures of GARCH(1,1) Models

Journal of Probability and Statistics ◽

10.1155/2012/167431 ◽

2012 ◽

Vol 2012 ◽

pp. 1-16

Author(s):

John W. Lau ◽

Ed Cripps

Keyword(s):

Time Series ◽

Dirichlet Process ◽

Time Series Data ◽

Monte Carlo Algorithm ◽

Series Data ◽

Garch Models ◽

Nonparametric Models ◽

Regime Changes ◽

Standard And Poor's ◽

Financial Index

Traditional GARCH models describe volatility levels that evolve smoothly over time, generated by a single GARCH regime. However, nonstationary time series data may exhibit abrupt changes in volatility, suggesting changes in the underlying GARCH regimes. Further, the number and times of regime changes are not always obvious. This article outlines a nonparametric mixture of GARCH models that is able to estimate the number and time of volatility regime changes by mixing over the Poisson-Kingman process. The process is a generalisation of the Dirichlet process typically used in nonparametric models for time-dependent data provides a richer clustering structure, and its application to time series data is novel. Inference is Bayesian, and a Markov chain Monte Carlo algorithm to explore the posterior distribution is described. The methodology is illustrated on the Standard and Poor's 500 financial index.

Download Full-text

Finite mixture model: a comparison method for nonlinear time series data

International Journal of Computing Science and Mathematics ◽

10.1504/ijcsm.2016.078684 ◽

2016 ◽

Vol 7 (4) ◽

pp. 381

Author(s):

Seuk Yen Phoong ◽

Mohd Tahir Ismail ◽

Seuk Wai Phoong ◽

Rosmanjawati Binti Abdul Rahman

Keyword(s):

Time Series ◽

Mixture Model ◽

Time Series Data ◽

Finite Mixture Model ◽

Nonlinear Time Series ◽

Comparison Method ◽

Finite Mixture ◽

Series Data

Download Full-text

Detecting variability in massive astronomical time series data â I. Application of an infinite Gaussian mixture model

Monthly Notices of the Royal Astronomical Society ◽

10.1111/j.1365-2966.2009.15576.x ◽

2009 ◽

Vol 400 (4) ◽

pp. 1897-1910 ◽

Cited By ~ 28

Author(s):

Min-Su Shin ◽

Michael Sekora ◽

Yong-Ik Byun

Keyword(s):

Time Series ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Time Series Data ◽

Gaussian Mixture ◽

Series Data ◽

Astronomical Time ◽

Astronomical Time Series

Download Full-text

Clustering gene expression time series data using an infinite Gaussian process mixture model

Discovering Key Transcriptomic Regulators in Pancreatic Ductal Adenocarcinoma using Dirichlet Process Gaussian Mixture Model

Clustering gene expression time series data using an infinite Gaussian process mixture model

Estimation of Most Probable Maximum From Short-Duration or Undersampled Time-Series Data

Sequence Pattern Extraction by Segmenting Time Series Data Using GP-HSMM with Hierarchical Dirichlet Process

A Gaussian process regression approach for testing Granger causality between time series data

Visualizing State of Time-Series Data by Supervised Gaussian Process Dynamical Models

Finite mixture model: a comparison method for nonlinear time series data

Bayesian Non-Parametric Mixtures of GARCH(1,1) Models

Finite mixture model: a comparison method for nonlinear time series data

Detecting variability in massive astronomical time series data â I. Application of an infinite Gaussian mixture model

Detecting variability in massive astronomical time series data â I. Application of an infinite Gaussian mixture model