scholarly journals Time Series Experiment Design Under One-Shot Sampling: The Importance of Condition Diversity

2019 ◽  
Author(s):  
Xiaohan Kang ◽  
Bruce Hajek ◽  
Faqiang Wu ◽  
Yoshie Hanzawa

AbstractMany biological data sets are prepared using one-shot sampling, in which each individual organism provides only one sample. Time series therefore do not follow trajectories of individuals over time. However, samples collected at different times from individuals grown/raised under the same conditions share the same perturbations of the biological processes, and hence behave as surrogates for multiple samples from a single individual at different times. This implies the importance of growing/raising individuals under multiple conditions if one-shot sampling is used. This paper models the condition effect explicitly by correlated perturbations in the variations driving the expression dynamics, quantifies the performance of the generalized likelihood-ratio test for network structure, and illustrates the difficulty in network reconstruction under one-shot sampling when the condition effect is absent.

2014 ◽  
Author(s):  
Young Hwan Chang ◽  
Jim Korkola ◽  
Dhara N. Amin ◽  
Mark M. Moasser ◽  
Jose M. Carmena ◽  
...  

With the advent of high-throughput measurement techniques, scientists and engineers are starting to grapple with massive data sets and encountering challenges with how to organize, process and extract information into meaningful structures. Multidimensional spatio-temporal biological data sets such as time series gene expression with various perturbations with different cell lines, or neural spike data sets across many experimental trials have the potential to acquire insight across multiple dimensions. For this potential to be realized, we need a suitable representation to turn data into insight. Since a wide range of experiments and the (unknown) complexity of underlying system make biological data more heterogeneous than those in other fields, we propose the method based on Robust Principal Component Analysis (RPCA), which is well suited for extracting principal components where we have corrupted observations. The proposed method provides us a new representation of these data sets which consists of its common and aberrant response. This representation might help users to acquire a new insight from data. %For example, identifying common event-related neural features across many experimental trials can be used as a signature to detect discrete events or state transitions. Also, the proposed method can be useful to biologists in clustering and analyzing gene expression time series data with a new perspective, for example, it can not only extract canonical cell signaling response but also inform them to get insight into the heterogeneity of different responses across different cell lines.


1984 ◽  
Vol 30 (104) ◽  
pp. 66-76 ◽  
Author(s):  
Paul A. Mayewski ◽  
W. Berry Lyons ◽  
N. Ahmad ◽  
Gordon Smith ◽  
M. Pourchet

AbstractSpectral analysis of time series of a c. 17 ± 0.3 year core, calibrated for total ß activity recovered from Sentik Glacier (4908m) Ladakh, Himalaya, yields several recognizable periodicities including subannual, annual, and multi-annual. The time-series, include both chemical data (chloride, sodium, reactive iron, reactive silicate, reactive phosphate, ammonium, δD, δ(18O) and pH) and physical data (density, debris and ice-band locations, and microparticles in size grades 0.50 to 12.70 μm). Source areas for chemical species investigated and general air-mass circulation defined from chemical and physical time-series are discussed to demonstrate the potential of such studies in the development of paleometeorological data sets from remote high-alpine glacierized sites such as the Himalaya.


Author(s):  
Cong Gao ◽  
Ping Yang ◽  
Yanping Chen ◽  
Zhongmin Wang ◽  
Yue Wang

AbstractWith large deployment of wireless sensor networks, anomaly detection for sensor data is becoming increasingly important in various fields. As a vital data form of sensor data, time series has three main types of anomaly: point anomaly, pattern anomaly, and sequence anomaly. In production environments, the analysis of pattern anomaly is the most rewarding one. However, the traditional processing model cloud computing is crippled in front of large amount of widely distributed data. This paper presents an edge-cloud collaboration architecture for pattern anomaly detection of time series. A task migration algorithm is developed to alleviate the problem of backlogged detection tasks at edge node. Besides, the detection tasks related to long-term correlation and short-term correlation in time series are allocated to cloud and edge node, respectively. A multi-dimensional feature representation scheme is devised to conduct efficient dimension reduction. Two key components of the feature representation trend identification and feature point extraction are elaborated. Based on the result of feature representation, pattern anomaly detection is performed with an improved kernel density estimation method. Finally, extensive experiments are conducted with synthetic data sets and real-world data sets.


2020 ◽  
Vol 21 (S18) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In recent years, to investigate challenging bioinformatics problems, the utilization of multiple genomic and proteomic sources has become immensely popular among researchers. One such issue is feature or gene selection and identifying relevant and non-redundant marker genes from high dimensional gene expression data sets. In that context, designing an efficient feature selection algorithm exploiting knowledge from multiple potential biological resources may be an effective way to understand the spectrum of cancer or other diseases with applications in specific epidemiology for a particular population. Results In the current article, we design the feature selection and marker gene detection as a multi-view multi-objective clustering problem. Regarding that, we propose an Unsupervised Multi-View Multi-Objective clustering-based gene selection approach called UMVMO-select. Three important resources of biological data (gene ontology, protein interaction data, protein sequence) along with gene expression values are collectively utilized to design two different views. UMVMO-select aims to reduce gene space without/minimally compromising the sample classification efficiency and determines relevant and non-redundant gene markers from three cancer gene expression benchmark data sets. Conclusion A thorough comparative analysis has been performed with five clustering and nine existing feature selection methods with respect to several internal and external validity metrics. Obtained results reveal the supremacy of the proposed method. Reported results are also validated through a proper biological significance test and heatmap plotting.


2021 ◽  
Vol 5 (1) ◽  
pp. 10
Author(s):  
Mark Levene

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.


Genetics ◽  
2000 ◽  
Vol 154 (1) ◽  
pp. 381-395
Author(s):  
Pavel Morozov ◽  
Tatyana Sitnikova ◽  
Gary Churchill ◽  
Francisco José Ayala ◽  
Andrey Rzhetsky

Abstract We propose models for describing replacement rate variation in genes and proteins, in which the profile of relative replacement rates along the length of a given sequence is defined as a function of the site number. We consider here two types of functions, one derived from the cosine Fourier series, and the other from discrete wavelet transforms. The number of parameters used for characterizing the substitution rates along the sequences can be flexibly changed and in their most parameter-rich versions, both Fourier and wavelet models become equivalent to the unrestricted-rates model, in which each site of a sequence alignment evolves at a unique rate. When applied to a few real data sets, the new models appeared to fit data better than the discrete gamma model when compared with the Akaike information criterion and the likelihood-ratio test, although the parametric bootstrap version of the Cox test performed for one of the data sets indicated that the difference in likelihoods between the two models is not significant. The new models are applicable to testing biological hypotheses such as the statistical identity of rate variation profiles among homologous protein families. These models are also useful for determining regions in genes and proteins that evolve significantly faster or slower than the sequence average. We illustrate the application of the new method by analyzing human immunoglobulin and Drosophilid alcohol dehydrogenase sequences.


2019 ◽  
Vol 93 (12) ◽  
pp. 2651-2660 ◽  
Author(s):  
Sergey Samsonov

AbstractThe previously presented Multidimensional Small Baseline Subset (MSBAS-2D) technique computes two-dimensional (2D), east and vertical, ground deformation time series from two or more ascending and descending Differential Interferometric Synthetic Aperture Radar (DInSAR) data sets by assuming that the contribution of the north deformation component is negligible. DInSAR data sets can be acquired with different temporal and spatial resolutions, viewing geometries and wavelengths. The MSBAS-2D technique has previously been used for mapping deformation due to mining, urban development, carbon sequestration, permafrost aggradation and pingo growth, and volcanic activities. In the case of glacier ice flow, the north deformation component is often too large to be negligible. Historically, the surface-parallel flow (SPF) constraint was used to compute the static three-dimensional (3D) velocity field at various glaciers. A novel MSBAS-3D technique has been developed for computing 3D deformation time series where the SPF constraint is utilized. This technique is used for mapping 3D deformation at the Barnes Ice Cap, Baffin Island, Nunavut, Canada, during January–March 2015, and the MSBAS-2D and MSBAS-3D solutions are compared. The MSBAS-3D technique can be used for studying glacier ice flow at other glaciers and other surface deformation processes with large north deformation component, such as landslides. The software implementation of MSBAS-3D technique can be downloaded from http://insar.ca/.


Plant Disease ◽  
2006 ◽  
Vol 90 (11) ◽  
pp. 1433-1440 ◽  
Author(s):  
David H. Gent ◽  
Walter F. Mahaffee ◽  
William W. Turechek

The spatial heterogeneity of the incidence of hop cones with powdery mildew (Podosphaera macularis) was characterized from transect surveys of 41 commercial hop yards in Oregon and Washington from 2000 to 2005. The proportion of sampled cones with powdery mildew ( p) was recorded for each of 221 transects, where N = 60 sampling units of n = 25 cones assessed in each transect according to a cluster sampling strategy. Disease incidence ranged from 0 to 0.92 among all yards and dates. The binomial and beta-binomial frequency distributions were fit to the N sampling units in a transect using maximum likelihood. The estimation procedure converged for 74% of the data sets where p > 0, and a loglikelihood ratio test indicated that the beta-binomial distribution provided a better fit to the data than the binomial distribution for 46% of the data sets, indicating an aggregated pattern of disease. Similarly, the C(α) test indicated that 54% could be described by the beta-binomial distribution. The heterogeneity parameter of the beta-binomial distribution, θ, a measure of variation among sampling units, ranged from 0.01 to 0.20, with a mean of 0.037 and a median of 0.015. Estimates of the index of dispersion ranged from 0.79 to 7.78, with a mean of 1.81 and a median of 1.37, and were significantly greater than 1 for 54% of the data sets. The binary power law provided an excellent fit to the data, with slope and intercept parameters significantly greater than 1, which indicated that heterogeneity varied systematically with the incidence of infected cones. A covariance analysis indicated that the geographic location (region) of the yards and the type of hop cultivar had little effect on heterogeneity; however, the year of sampling significantly influenced the intercept and slope parameters of the binary power law. Significant spatial autocorrelation was detected in only 11% of the data sets, with estimates of first-order autocorrelation, r1, ranging from -0.30 to 0.70, with a mean of 0.06 and a median of 0.04; however, correlation was detected in only 20 and 16% of the data sets by median and ordinary runs analysis, respectively. Together, these analyses suggest that the incidence of powdery mildew on cones was slightly aggregated among plants, but patterns of aggregation larger than the sampling unit were rare (20% or less of data sets). Knowledge of the heterogeneity of diseased cones was used to construct fixed sampling curves to precisely estimate the incidence of powdery mildew on cones at varying disease intensities. Use of the sampling curves developed in this research should help to improve sampling methods for disease assessment and management decisions.


2014 ◽  
Vol 11 (2) ◽  
pp. 68-79
Author(s):  
Matthias Klapperstück ◽  
Falk Schreiber

Summary The visualization of biological data gained increasing importance in the last years. There is a large number of methods and software tools available that visualize biological data including the combination of measured experimental data and biological networks. With growing size of networks their handling and exploration becomes a challenging task for the user. In addition, scientists also have an interest in not just investigating a single kind of network, but on the combination of different types of networks, such as metabolic, gene regulatory and protein interaction networks. Therefore, fast access, abstract and dynamic views, and intuitive exploratory methods should be provided to search and extract information from the networks. This paper will introduce a conceptual framework for handling and combining multiple network sources that enables abstract viewing and exploration of large data sets including additional experimental data. It will introduce a three-tier structure that links network data to multiple network views, discuss a proof of concept implementation, and shows a specific visualization method for combining metabolic and gene regulatory networks in an example.


Sign in / Sign up

Export Citation Format

Share Document