Wave-SOM

Author(s):  
Andrew Blanchard ◽  
Christopher Wolter ◽  
David S. McNabb ◽  
Eitan Gross

In this paper, the authors present a wavelet-based algorithm (Wave-SOM) to help visualize and cluster oscillatory time-series data in two-dimensional gene expression micro-arrays. Using various wavelet transformations, raw data are first de-noised by decomposing the time-series into low and high frequency wavelet coefficients. Following thresholding, the coefficients are fed as an input vector into a two-dimensional Self-Organizing-Map clustering algorithm. Transformed data are then clustered by minimizing the Euclidean (L2) distance between their corresponding fluctuation patterns. A multi-resolution analysis by Wave-SOM of expression data from the yeast Saccharomyces cerevisiae, exposed to oxidative stress and glucose-limited growth, identified 29 genes with correlated expression patterns that were mapped into 5 different nodes. The ordered clustering of yeast genes by Wave-SOM illustrates that the same set of genes (encoding ribosomal proteins) can be regulated by two different environmental stresses, oxidative stress and starvation. The algorithm provides heuristic information regarding the similarity of different genes. Using previously studied expression patterns of yeast cell-cycle and functional genes as test data sets, the authors’ algorithm outperformed five other competing programs.

Author(s):  
Andrew Blanchard ◽  
Christopher Wolter ◽  
David S. McNabb ◽  
Eitan Gross

In this paper, the authors present a wavelet-based algorithm (Wave-SOM) to help visualize and cluster oscillatory time-series data in two-dimensional gene expression micro-arrays. Using various wavelet transformations, raw data are first de-noised by decomposing the time-series into low and high frequency wavelet coefficients. Following thresholding, the coefficients are fed as an input vector into a two-dimensional Self-Organizing-Map clustering algorithm. Transformed data are then clustered by minimizing the Euclidean (L2) distance between their corresponding fluctuation patterns. A multi-resolution analysis by Wave-SOM of expression data from the yeast Saccharomyces cerevisiae, exposed to oxidative stress and glucose-limited growth, identified 29 genes with correlated expression patterns that were mapped into 5 different nodes. The ordered clustering of yeast genes by Wave-SOM illustrates that the same set of genes (encoding ribosomal proteins) can be regulated by two different environmental stresses, oxidative stress and starvation. The algorithm provides heuristic information regarding the similarity of different genes. Using previously studied expression patterns of yeast cell-cycle and functional genes as test data sets, the authors’ algorithm outperformed five other competing programs.


2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Hitoshi Iuchi ◽  
Michiaki Hamada

Abstract Time-course experiments using parallel sequencers have the potential to uncover gradual changes in cells over time that cannot be observed in a two-point comparison. An essential step in time-series data analysis is the identification of temporal differentially expressed genes (TEGs) under two conditions (e.g. control versus case). Model-based approaches, which are typical TEG detection methods, often set one parameter (e.g. degree or degree of freedom) for one dataset. This approach risks modeling of linearly increasing genes with higher-order functions, or fitting of cyclic gene expression with linear functions, thereby leading to false positives/negatives. Here, we present a Jonckheere–Terpstra–Kendall (JTK)-based non-parametric algorithm for TEG detection. Benchmarks, using simulation data, show that the JTK-based approach outperforms existing methods, especially in long time-series experiments. Additionally, application of JTK in the analysis of time-series RNA-seq data from seven tissue types, across developmental stages in mouse and rat, suggested that the wave pattern contributes to the TEG identification of JTK, not the difference in expression levels. This result suggests that JTK is a suitable algorithm when focusing on expression patterns over time rather than expression levels, such as comparisons between different species. These results show that JTK is an excellent candidate for TEG detection.


Author(s):  
Pēteris Grabusts ◽  
Arkady Borisov

Clustering Methodology for Time Series MiningA time series is a sequence of real data, representing the measurements of a real variable at time intervals. Time series analysis is a sufficiently well-known task; however, in recent years research has been carried out with the purpose to try to use clustering for the intentions of time series analysis. The main motivation for representing a time series in the form of clusters is to better represent the main characteristics of the data. The central goal of the present research paper was to investigate clustering methodology for time series data mining, to explore the facilities of time series similarity measures and to use them in the analysis of time series clustering results. More complicated similarity measures include Longest Common Subsequence method (LCSS). In this paper, two tasks have been completed. The first task was to define time series similarity measures. It has been established that LCSS method gives better results in the detection of time series similarity than the Euclidean distance. The second task was to explore the facilities of the classical k-means clustering algorithm in time series clustering. As a result of the experiment a conclusion has been drawn that the results of time series clustering with the help of k-means algorithm correspond to the results obtained with LCSS method, thus the clustering results of the specific time series are adequate.


2017 ◽  
Author(s):  
María José Nueda ◽  
Jordi Martorell-Marugan ◽  
Cristina Martí ◽  
Sonia Tarazona ◽  
Ana Conesa

AbstractAs sequencing technologies improve their capacity to detect distinct transcripts of the same gene and to address complex experimental designs such as longitudinal studies, there is a need to develop statistical methods for the analysis of isoform expression changes in time series data. Iso-maSigPro is a new functionality of the R package maSigPro for transcriptomics time series data analysis. Iso-maSigPro identifies genes with a differential isoform usage across time. The package also includes new clustering and visualization functions that allow grouping of genes with similar expression patterns at the isoform level, as well as those genes with a shift in major expressed isoform. The package is freely available under the LGPL license from the Bioconductor web site (http://bioconductor.org).


2021 ◽  
Vol 7 ◽  
pp. e534
Author(s):  
Kristoko Dwi Hartomo ◽  
Yessica Nataliani

This paper aims to propose a new model for time series forecasting that combines forecasting with clustering algorithm. It introduces a new scheme to improve the forecasting results by grouping the time series data using k-means clustering algorithm. It utilizes the clustering result to get the forecasting data. There are usually some user-defined parameters affecting the forecasting results, therefore, a learning-based procedure is proposed to estimate the parameters that will be used for forecasting. This parameter value is computed in the algorithm simultaneously. The result of the experiment compared to other forecasting algorithms demonstrates good results for the proposed model. It has the smallest mean squared error of 13,007.91 and the average improvement rate of 19.83%.


In this paper, we analyze, model, predict and cluster Global Active Power, i.e., a time series data obtained at one minute intervals from electricity sensors of a household. We analyze changes in seasonality and trends to model the data. We then compare various forecasting methods such as SARIMA and LSTM to forecast sensor data for the household and combine them to achieve a hybrid model that captures nonlinear variations better than either SARIMA or LSTM used in isolation. Finally, we cluster slices of time series data effectively using a novel clustering algorithm that is a combination of density-based and centroid-based approaches, to discover relevant subtle clusters from sensor data. Our experiments have yielded meaningful insights from the data at both a micro, day-to-day granularity, as well as a macro, weekly to monthly granularity.


Sign in / Sign up

Export Citation Format

Share Document