Time Series Experiment Design Under One-Shot Sampling: The Importance of Condition Diversity

Mapping Intimacies ◽

10.1101/537548 ◽

2019 ◽

Author(s):

Xiaohan Kang ◽

Bruce Hajek ◽

Faqiang Wu ◽

Yoshie Hanzawa

Keyword(s):

Time Series ◽

Biological Data ◽

Data Sets ◽

Ratio Test ◽

Single Individual ◽

Individual Organism ◽

Condition Effect ◽

Multiple Samples ◽

Sample Time ◽

Multiple Conditions

AbstractMany biological data sets are prepared using one-shot sampling, in which each individual organism provides only one sample. Time series therefore do not follow trajectories of individuals over time. However, samples collected at different times from individuals grown/raised under the same conditions share the same perturbations of the biological processes, and hence behave as surrogates for multiple samples from a single individual at different times. This implies the importance of growing/raising individuals under multiple conditions if one-shot sampling is used. This paper models the condition effect explicitly by correlated perturbations in the variations driving the expression dynamics, quantifies the performance of the generalized likelihood-ratio test for network structure, and illustrates the difficulty in network reconstruction under one-shot sampling when the condition effect is absent.

Download Full-text

Disentangling Multidimensional Spatio-Temporal Data into their Common and Aberrant Responses

10.1101/004259 ◽

2014 ◽

Author(s):

Young Hwan Chang ◽

Jim Korkola ◽

Dhara N. Amin ◽

Mark M. Moasser ◽

Jose M. Carmena ◽

...

Keyword(s):

Gene Expression ◽

Time Series ◽

Cell Lines ◽

Biological Data ◽

Series Data ◽

State Transitions ◽

Data Sets ◽

Wide Range ◽

Spatio Temporal ◽

Experimental Trials

With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatio-temporal biological data sets such as time series gene expression with various perturbations with different cell lines, or neural spike data sets across many experimental trials have the potential to acquire insight across multiple dimensions. For this potential to be realized, we need a suitable representation to turn data into insight. Since a wide range of experiments and the (unknown) complexity of underlying system make biological data more heterogeneous than those in other fields, we propose the method based on Robust Principal Component Analysis (RPCA), which is well suited for extracting principal components where we have corrupted observations. The proposed method provides us a new representation of these data sets which consists of its common and aberrant response. This representation might help users to acquire a new insight from data. %For example, identifying common event-related neural features across many experimental trials can be used as a signature to detect discrete events or state transitions. Also, the proposed method can be useful to biologists in clustering and analyzing gene expression time series data with a new perspective, for example, it can not only extract canonical cell signaling response but also inform them to get insight into the heterogeneity of different responses across different cell lines.

Download Full-text

Structural Change Point Detection Method of Time Series Using Sequential Probability Ratio Test

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.128.583 ◽

2008 ◽

Vol 128 (4) ◽

pp. 583-592 ◽

Cited By ~ 4

Author(s):

Hiromichi Kawano ◽

Tetsuo Hattori ◽

Ken Nishimatsu

Keyword(s):

Time Series ◽

Structural Change ◽

Change Point ◽

Detection Method ◽

Sequential Probability Ratio Test ◽

Change Point Detection ◽

Ratio Test ◽

Probability Ratio ◽

Sequential Probability ◽

Point Detection

Download Full-text

Interpretation of the Chemical and Physical Time-Series Retrieved from Sentik Glacier, Ladakh Himalaya, India

Journal of Glaciology ◽

10.3189/s0022143000008509 ◽

1984 ◽

Vol 30 (104) ◽

pp. 66-76 ◽

Cited By ~ 2

Author(s):

Paul A. Mayewski ◽

W. Berry Lyons ◽

N. Ahmad ◽

Gordon Smith ◽

M. Pourchet

Keyword(s):

Time Series ◽

Chemical Species ◽

Data Sets ◽

Reactive Iron ◽

Physical Time ◽

Ladakh Himalaya ◽

Data Density ◽

Mass Circulation ◽

The Himalaya ◽

Analysis Of Time Series

AbstractSpectral analysis of time series of a c. 17 ± 0.3 year core, calibrated for total ß activity recovered from Sentik Glacier (4908m) Ladakh, Himalaya, yields several recognizable periodicities including subannual, annual, and multi-annual. The time-series, include both chemical data (chloride, sodium, reactive iron, reactive silicate, reactive phosphate, ammonium, δD, δ(18O) and pH) and physical data (density, debris and ice-band locations, and microparticles in size grades 0.50 to 12.70 μm). Source areas for chemical species investigated and general air-mass circulation defined from chemical and physical time-series are discussed to demonstrate the potential of such studies in the development of paleometeorological data sets from remote high-alpine glacierized sites such as the Himalaya.

Download Full-text

An edge-cloud collaboration architecture for pattern anomaly detection of time series in wireless sensor networks

Complex & Intelligent Systems ◽

10.1007/s40747-021-00442-6 ◽

2021 ◽

Author(s):

Cong Gao ◽

Ping Yang ◽

Yanping Chen ◽

Zhongmin Wang ◽

Yue Wang

Keyword(s):

Time Series ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Anomaly Detection ◽

Estimation Method ◽

Feature Representation ◽

Sensor Data ◽

Wireless Sensor ◽

Data Sets ◽

Edge Node

AbstractWith large deployment of wireless sensor networks, anomaly detection for sensor data is becoming increasingly important in various fields. As a vital data form of sensor data, time series has three main types of anomaly: point anomaly, pattern anomaly, and sequence anomaly. In production environments, the analysis of pattern anomaly is the most rewarding one. However, the traditional processing model cloud computing is crippled in front of large amount of widely distributed data. This paper presents an edge-cloud collaboration architecture for pattern anomaly detection of time series. A task migration algorithm is developed to alleviate the problem of backlogged detection tasks at edge node. Besides, the detection tasks related to long-term correlation and short-term correlation in time series are allocated to cloud and edge node, respectively. A multi-dimensional feature representation scheme is devised to conduct efficient dimension reduction. Two key components of the feature representation trend identification and feature point extraction are elaborated. Based on the result of feature representation, pattern anomaly detection is performed with an improved kernel density estimation method. Finally, extensive experiments are conducted with synthetic data sets and real-world data sets.

Download Full-text

Multi-view feature selection for identifying gene markers: a diversified biological data driven approach

BMC Bioinformatics ◽

10.1186/s12859-020-03810-0 ◽

2020 ◽

Vol 21 (S18) ◽

Author(s):

Sudipta Acharya ◽

Laizhong Cui ◽

Yi Pan

Keyword(s):

Gene Expression ◽

Feature Selection ◽

Gene Selection ◽

Marker Gene ◽

Biological Data ◽

Protein Interaction Data ◽

Marker Genes ◽

Data Sets ◽

Gene Markers ◽

Multi Objective

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.

Download Full-text

A Hypothesis Test for the Goodness-of-Fit of the Marginal Distribution of a Time Series with Application to Stablecoin Data

Engineering Proceedings ◽

10.3390/engproc2021005010 ◽

2021 ◽

Vol 5 (1) ◽

pp. 10

Author(s):

Mark Levene

Keyword(s):

Time Series ◽

Goodness Of Fit ◽

Marginal Distribution ◽

Hypothesis Test ◽

Data Sets ◽

Test Statistic ◽

Sample Test ◽

Kolmogorov Smirnov ◽

Heavy Tailed ◽

Jensen Shannon Divergence

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.

Download Full-text

A New Method for Characterizing Replacement Rate Variation in Molecular Sequences: Application of the Fourier and Wavelet Models to Drosophila and Mammalian Proteins

Genetics ◽

10.1093/genetics/154.1.381 ◽

2000 ◽

Vol 154 (1) ◽

pp. 381-395

Author(s):

Pavel Morozov ◽

Tatyana Sitnikova ◽

Gary Churchill ◽

Francisco José Ayala ◽

Andrey Rzhetsky

Keyword(s):

Wavelet Transforms ◽

Parametric Bootstrap ◽

Real Data ◽

New Method ◽

Rate Variation ◽

Discrete Wavelet ◽

Data Sets ◽

Ratio Test ◽

Replacement Rate ◽

New Models

Abstract We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.

Download Full-text

Three-dimensional deformation time series of glacier motion from multiple-aperture DInSAR observation

Journal of Geodesy ◽

10.1007/s00190-019-01325-y ◽

2019 ◽

Vol 93 (12) ◽

pp. 2651-2660 ◽

Cited By ~ 2

Author(s):

Sergey Samsonov

Keyword(s):

Time Series ◽

Surface Deformation ◽

Ground Deformation ◽

Three Dimensional ◽

Data Sets ◽

Ice Flow ◽

The North ◽

Glacier Ice ◽

Deformation Component ◽

3D Deformation

AbstractThe previously presented Multidimensional Small Baseline Subset (MSBAS-2D) technique computes two-dimensional (2D), east and vertical, ground deformation time series from two or more ascending and descending Differential Interferometric Synthetic Aperture Radar (DInSAR) data sets by assuming that the contribution of the north deformation component is negligible. DInSAR data sets can be acquired with different temporal and spatial resolutions, viewing geometries and wavelengths. The MSBAS-2D technique has previously been used for mapping deformation due to mining, urban development, carbon sequestration, permafrost aggradation and pingo growth, and volcanic activities. In the case of glacier ice flow, the north deformation component is often too large to be negligible. Historically, the surface-parallel flow (SPF) constraint was used to compute the static three-dimensional (3D) velocity field at various glaciers. A novel MSBAS-3D technique has been developed for computing 3D deformation time series where the SPF constraint is utilized. This technique is used for mapping 3D deformation at the Barnes Ice Cap, Baffin Island, Nunavut, Canada, during January–March 2015, and the MSBAS-2D and MSBAS-3D solutions are compared. The MSBAS-3D technique can be used for studying glacier ice flow at other glaciers and other surface deformation processes with large north deformation component, such as landslides. The software implementation of MSBAS-3D technique can be downloaded from http://insar.ca/.

Download Full-text

Spatial Heterogeneity of the Incidence of Powdery Mildew on Hop Cones

Plant Disease ◽

10.1094/pd-90-1433 ◽

2006 ◽

Vol 90 (11) ◽

pp. 1433-1440 ◽

Cited By ~ 17

Author(s):

David H. Gent ◽

Walter F. Mahaffee ◽

William W. Turechek

Keyword(s):

Powdery Mildew ◽

Spatial Heterogeneity ◽

Power Law ◽

Binomial Distribution ◽

Disease Incidence ◽

Disease Assessment ◽

Cluster Sampling ◽

Data Sets ◽

Ratio Test ◽

Frequency Distributions

The spatial heterogeneity of the incidence of hop cones with powdery mildew (Podosphaera macularis) was characterized from transect surveys of 41 commercial hop yards in Oregon and Washington from 2000 to 2005. The proportion of sampled cones with powdery mildew ( p) was recorded for each of 221 transects, where N = 60 sampling units of n = 25 cones assessed in each transect according to a cluster sampling strategy. Disease incidence ranged from 0 to 0.92 among all yards and dates. The binomial and beta-binomial frequency distributions were fit to the N sampling units in a transect using maximum likelihood. The estimation procedure converged for 74% of the data sets where p > 0, and a loglikelihood ratio test indicated that the beta-binomial distribution provided a better fit to the data than the binomial distribution for 46% of the data sets, indicating an aggregated pattern of disease. Similarly, the C(α) test indicated that 54% could be described by the beta-binomial distribution. The heterogeneity parameter of the beta-binomial distribution, θ, a measure of variation among sampling units, ranged from 0.01 to 0.20, with a mean of 0.037 and a median of 0.015. Estimates of the index of dispersion ranged from 0.79 to 7.78, with a mean of 1.81 and a median of 1.37, and were significantly greater than 1 for 54% of the data sets. The binary power law provided an excellent fit to the data, with slope and intercept parameters significantly greater than 1, which indicated that heterogeneity varied systematically with the incidence of infected cones. A covariance analysis indicated that the geographic location (region) of the yards and the type of hop cultivar had little effect on heterogeneity; however, the year of sampling significantly influenced the intercept and slope parameters of the binary power law. Significant spatial autocorrelation was detected in only 11% of the data sets, with estimates of first-order autocorrelation, r1, ranging from -0.30 to 0.70, with a mean of 0.06 and a median of 0.04; however, correlation was detected in only 20 and 16% of the data sets by median and ordinary runs analysis, respectively. Together, these analyses suggest that the incidence of powdery mildew on cones was slightly aggregated among plants, but patterns of aggregation larger than the sampling unit were rare (20% or less of data sets). Knowledge of the heterogeneity of diseased cones was used to construct fixed sampling curves to precisely estimate the incidence of powdery mildew on cones at varying disease intensities. Use of the sampling curves developed in this research should help to improve sampling methods for disease assessment and management decisions.

Download Full-text

BioNetLink - An Architecture for Working with Network Data

Journal of Integrative Bioinformatics ◽

10.1515/jib-2014-241 ◽

2014 ◽

Vol 11 (2) ◽

pp. 68-79

Author(s):

Matthias Klapperstück ◽

Falk Schreiber

Keyword(s):

Experimental Data ◽

Biological Networks ◽

Regulatory Networks ◽

Large Data ◽

Biological Data ◽

Network Data ◽

Data Sets ◽

Dynamic Views ◽

Gene Regulatory ◽

Multiple Network

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.

Download Full-text