Disentangling Multidimensional Spatio-Temporal Data into their Common and Aberrant Responses

Mapping Intimacies ◽

10.1101/004259 ◽

2014 ◽

Author(s):

Young Hwan Chang ◽

Jim Korkola ◽

Dhara N. Amin ◽

Mark M. Moasser ◽

Jose M. Carmena ◽

...

Keyword(s):

Gene Expression ◽

Time Series ◽

Cell Lines ◽

Biological Data ◽

Series Data ◽

State Transitions ◽

Data Sets ◽

Wide Range ◽

Spatio Temporal ◽

Experimental Trials

With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatio-temporal biological data sets such as time series gene expression with various perturbations with different cell lines, or neural spike data sets across many experimental trials have the potential to acquire insight across multiple dimensions. For this potential to be realized, we need a suitable representation to turn data into insight. Since a wide range of experiments and the (unknown) complexity of underlying system make biological data more heterogeneous than those in other fields, we propose the method based on Robust Principal Component Analysis (RPCA), which is well suited for extracting principal components where we have corrupted observations. The proposed method provides us a new representation of these data sets which consists of its common and aberrant response. This representation might help users to acquire a new insight from data. %For example, identifying common event-related neural features across many experimental trials can be used as a signature to detect discrete events or state transitions. Also, the proposed method can be useful to biologists in clustering and analyzing gene expression time series data with a new perspective, for example, it can not only extract canonical cell signaling response but also inform them to get insight into the heterogeneity of different responses across different cell lines.

Download Full-text

Cell cycle time series gene expression data encoded as cyclic attractors in Hopfield systems

10.1101/170027 ◽

2017 ◽

Author(s):

Anthony Szedlak ◽

Spencer Sims ◽

Nicholas Smith ◽

Giovanni Paternostro ◽

Carlo Piermarocchi

Keyword(s):

Neural Network ◽

Gene Expression ◽

Cell Cycle ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Sets ◽

Expression Data ◽

Time Series Gene Expression ◽

Human Cervical Cancer

AbstractModern time series gene expression and other omics data sets have enabled unprecedented resolution of the dynamics of cellular processes such as cell cycle and response to pharmaceutical compounds. In anticipation of the proliferation of time series data sets in the near future, we use the Hopfield model, a recurrent neural network based on spin glasses, to model the dynamics of cell cycle in HeLa (human cervical cancer) and S. cerevisiae cells. We study some of the rich dynamical properties of these cyclic Hopfield systems, including the ability of populations of simulated cells to recreate experimental expression data and the effects of noise on the dynamics. Next, we use a genetic algorithm to identify sets of genes which, when selectively inhibited by local external fields representing gene silencing compounds such as kinase inhibitors, disrupt the encoded cell cycle. We find, for example, that inhibiting the set of four kinases BRD4, MAPK1, NEK7, and YES1 in HeLa cells causes simulated cells to accumulate in the M phase. Finally, we suggest possible improvements and extensions to our model.Author SummaryCell cycle – the process in which a parent cell replicates its DNA and divides into two daughter cells – is an upregulated process in many forms of cancer. Identifying gene inhibition targets to regulate cell cycle is important to the development of effective therapies. Although modern high throughput techniques offer unprecedented resolution of the molecular details of biological processes like cell cycle, analyzing the vast quantities of the resulting experimental data and extracting actionable information remains a formidable task. Here, we create a dynamical model of the process of cell cycle using the Hopfield model (a type of recurrent neural network) and gene expression data from human cervical cancer cells and yeast cells. We find that the model recreates the oscillations observed in experimental data. Tuning the level of noise (representing the inherent randomness in gene expression and regulation) to the “edge of chaos” is crucial for the proper behavior of the system. We then use this model to identify potential gene targets for disrupting the process of cell cycle. This method could be applied to other time series data sets and used to predict the effects of untested targeted perturbations.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

An Integrative DTW-based imputation method for gene expression time series data

2012 6th IEEE INTERNATIONAL CONFERENCE INTELLIGENT SYSTEMS ◽

10.1109/is.2012.6335145 ◽

2012 ◽

Cited By ~ 3

Author(s):

Elena Kostadinova ◽

Veselka Boeva ◽

Liliana Boneva ◽

Elena Tsiporkova

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Imputation Method ◽

Series Data ◽

Gene Expression Time Series ◽

Expression Time

Download Full-text

GeneShelf: A Web-based Visual Interface for Large Gene Expression Time-Series Data Repositories

IEEE Transactions on Visualization and Computer Graphics ◽

10.1109/tvcg.2009.146 ◽

2009 ◽

Vol 15 (6) ◽

pp. 905-912 ◽

Cited By ~ 9

Author(s):

Bohyoung Kim ◽

Bongshin Lee ◽

S. Knoblach ◽

E. Hoffman ◽

Jinwook Seo

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Series Data ◽

Series Data ◽

Data Repositories ◽

Web Based ◽

Large Gene ◽

Gene Expression Time Series ◽

Visual Interface ◽

Expression Time

Download Full-text

Jonckheere–Terpstra–Kendall-based non-parametric analysis of temporal differential gene expression

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqab021 ◽

2021 ◽

Vol 3 (1) ◽

Author(s):

Hitoshi Iuchi ◽

Michiaki Hamada

Keyword(s):

Gene Expression ◽

Time Series ◽

Time Course ◽

Time Series Data ◽

Expression Patterns ◽

Detection Methods ◽

Series Data ◽

Expression Levels ◽

Over Time ◽

Non Parametric

Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.

Download Full-text

Spatio-temporal changes of underground coal fires during 2008–2016 in Khanh Hoa coal field (North-east of Viet Nam) using Landsat time-series data

Journal of Mountain Science ◽

10.1007/s11629-018-4997-z ◽

2018 ◽

Vol 15 (12) ◽

pp. 2703-2720 ◽

Cited By ~ 1

Author(s):

Tuyen Danh Vu ◽

Thanh Tien Nguyen

Keyword(s):

Time Series ◽

Time Series Data ◽

Temporal Changes ◽

Series Data ◽

Viet Nam ◽

Coal Field ◽

North East ◽

Coal Fires ◽

Spatio Temporal ◽

Underground Coal Fires

Download Full-text

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Download Full-text

An Efficient Method for Forecasting Using Fuzzy Time Series

Emerging Research on Applied Fuzzy Sets and Intuitionistic Fuzzy Matrices - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-0914-1.ch013 ◽

2017 ◽

pp. 287-304 ◽

Cited By ~ 3

Author(s):

Pritpal Singh

Keyword(s):

Time Series ◽

Time Series Data ◽

Weather Forecasting ◽

Small Error ◽

Fuzzy Time Series ◽

Series Data ◽

Data Sets ◽

Proposed Model ◽

Temperature Forecasting ◽

The University

Forecasting using fuzzy time series has been applied in several areas including forecasting university enrollments, sales, road accidents, financial forecasting, weather forecasting, etc. Recently, many researchers have paid attention to apply fuzzy time series in time series forecasting problems. In this paper, we present a new model to forecast the enrollments in the University of Alabama and the daily average temperature in Taipei, based on one-factor fuzzy time series. In this model, a new frequency based clustering technique is employed for partitioning the time series data sets into different intervals. For defuzzification function, two new principles are also incorporated in this model. In case of enrollments as well daily temperature forecasting, proposed model exhibits very small error rate.

Download Full-text

Co-eye: a multi-resolution ensemble classifier for symbolically approximated time series

Machine Learning ◽

10.1007/s10994-020-05887-3 ◽

2020 ◽

Vol 109 (11) ◽

pp. 2029-2061

Author(s):

Zahraa S. Abdallah ◽

Mohamed Medhat Gaber

Keyword(s):

Time Series ◽

Time Series Data ◽

Ensemble Classifier ◽

Series Data ◽

New Classification ◽

Symbolic Representations ◽

Classification Technique ◽

Field Of Vision ◽

Main Challenge ◽

Wide Range

Abstract Time series classification (TSC) is a challenging task that attracted many researchers in the last few years. One main challenge in TSC is the diversity of domains where time series data come from. Thus, there is no “one model that fits all” in TSC. Some algorithms are very accurate in classifying a specific type of time series when the whole series is considered, while some only target the existence/non-existence of specific patterns/shapelets. Yet other techniques focus on the frequency of occurrences of discriminating patterns/features. This paper presents a new classification technique that addresses the inherent diversity problem in TSC using a nature-inspired method. The technique is stimulated by how flies look at the world through “compound eyes” that are made up of thousands of lenses, called ommatidia. Each ommatidium is an eye with its own lens, and thousands of them together create a broad field of vision. The developed technique similarly uses different lenses and representations to look at the time series, and then combines them for broader visibility. These lenses have been created through hyper-parameterisation of symbolic representations (Piecewise Aggregate and Fourier approximations). The algorithm builds a random forest for each lens, then performs soft dynamic voting for classifying new instances using the most confident eyes, i.e., forests. We evaluate the new technique, coined Co-eye, using the recently released extended version of UCR archive, containing more than 100 datasets across a wide range of domains. The results show the benefits of bringing together different perspectives reflecting on the accuracy and robustness of Co-eye in comparison to other state-of-the-art techniques.

Download Full-text

Inference of gene regulatory networks based on nonlinear ordinary differential equations

Bioinformatics ◽

10.1093/bioinformatics/btaa032 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4885-4893 ◽

Cited By ~ 2

Author(s):

Baoshan Ma ◽

Mingkun Fang ◽

Xiangtian Jiao

Keyword(s):

Gene Expression ◽

Time Series ◽

Steady State ◽

Differential Equations ◽

Gene Regulatory Networks ◽

Regulatory Networks ◽

Time Series Data ◽

Series Data ◽

State Data ◽

Gene Regulatory

Abstract Motivation Gene regulatory networks (GRNs) capture the regulatory interactions between genes, resulting from the fundamental biological process of transcription and translation. In some cases, the topology of GRNs is not known, and has to be inferred from gene expression data. Most of the existing GRNs reconstruction algorithms are either applied to time-series data or steady-state data. Although time-series data include more information about the system dynamics, steady-state data imply stability of the underlying regulatory networks. Results In this article, we propose a method for inferring GRNs from time-series and steady-state data jointly. We make use of a non-linear ordinary differential equations framework to model dynamic gene regulation and an importance measurement strategy to infer all putative regulatory links efficiently. The proposed method is evaluated extensively on the artificial DREAM4 dataset and two real gene expression datasets of yeast and Escherichia coli. Based on public benchmark datasets, the proposed method outperforms other popular inference algorithms in terms of overall score. By comparing the performance on the datasets with different scales, the results show that our method still keeps good robustness and accuracy at a low computational complexity. Availability and implementation The proposed method is written in the Python language, and is available at: https://github.com/lab319/GRNs_nonlinear_ODEs Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text