Learning and Imputation for Mass-spec Bias Reduction (LIMBR)

Mapping Intimacies ◽

10.1101/301242 ◽

2018 ◽

Author(s):

Alexander M Crowell ◽

Casey S Greene ◽

Jennifer J. Loros ◽

Jay C Dunlap

Keyword(s):

Time Series ◽

Missing Data ◽

Large Scale ◽

Bias Reduction ◽

Batch Effects ◽

Genome Wide ◽

Mass Spec ◽

Data Points ◽

Block Based ◽

Analysis Of Time Series

AbstractMotivationDecreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques.ResultsHere we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly [email protected], [email protected]

Download Full-text

A large-scale framework for storage, access and analysis of time series data in the manufacturing domain

Procedia CIRP ◽

10.1016/j.procir.2017.12.267 ◽

2018 ◽

Vol 67 ◽

pp. 595-600 ◽

Cited By ~ 2

Author(s):

Benjamin Mörzinger ◽

Thomas Weiler ◽

Thomas Trautner ◽

Iman Ayatollahi ◽

Bernhard Angerer ◽

...

Keyword(s):

Time Series ◽

Large Scale ◽

Time Series Data ◽

Series Data ◽

Analysis Of Time Series

Download Full-text

Scale-space analysis of time series in circulatory research

AJP Heart and Circulatory Physiology ◽

10.1152/ajpheart.00168.2006 ◽

2006 ◽

Vol 291 (6) ◽

pp. H3012-H3022 ◽

Cited By ~ 3

Author(s):

Kim Erlend Mortensen ◽

Fred Godtliebsen ◽

Arthur Revhaug

Keyword(s):

Time Series ◽

Statistical Analysis ◽

Real Time ◽

Repeated Measures ◽

Multiple Scales ◽

Visual Presentation ◽

Scale Space ◽

Statistical Tool ◽

Data Points ◽

Analysis Of Time Series

Statistical analysis of time series is still inadequate within circulation research. With the advent of increasing computational power and real-time recordings from hemodynamic studies, one is increasingly dealing with vast amounts of data in time series. This paper aims to illustrate how statistical analysis using the significant nonstationarities (SiNoS) method may complement traditional repeated-measures ANOVA and linear mixed models. We applied these methods on a dataset of local hepatic and systemic circulatory changes induced by aortoportal shunting and graded liver resection. We found SiNoS analysis more comprehensive when compared with traditional statistical analysis in the following four ways: 1) the method allows better signal-to-noise detection; 2) including all data points from real time recordings in a statistical analysis permits better detection of significant features in the data; 3) analysis with multiple scales of resolution facilitates a more differentiated observation of the material; and 4) the method affords excellent visual presentation by combining group differences, time trends, and multiscale statistical analysis allowing the observer to quickly view and evaluate the material. It is our opinion that SiNoS analysis of time series is a very powerful statistical tool that may be used to complement conventional statistical methods.

Download Full-text

BIAS REDUCTION FOR TIME SERIES MODELS BASED ON SUPPORT VECTOR REGRESSION

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127404010369 ◽

2004 ◽

Vol 14 (06) ◽

pp. 1947-1956 ◽

Cited By ~ 3

Author(s):

ALEXANDER HORNSTEIN ◽

ULRICH PARLITZ

Keyword(s):

Time Series ◽

Bias Reduction ◽

Support Vector ◽

Model Parameters ◽

Vector Machines ◽

New Learning ◽

Data Points ◽

Regression Techniques ◽

Regression Problems ◽

True Values

In the past few years a new learning method called Support Vector Machines (SVMs) has enjoyed increasing popularity. Based on statistical learning theory it shows very good generalization abilities. Though SVMs are mainly used for classification tasks, they are also applicable to regression problems and thus to modeling the dynamics of a time series. However when regression techniques are used to build dynamical models caution is advisable if the data are noisy. Due to correlations between data points, estimates of model parameters deviate systematically from the true values. An approach is presented to reduce such bias in SVM parameters.

Download Full-text

The frequency of pattern occurrence in random walks

Discrete Mathematics & Theoretical Computer Science ◽

10.46298/dmtcs.2476 ◽

2015 ◽

Vol DMTCS Proceedings, 27th... (Proceedings) ◽

Author(s):

Sergi Elizalde ◽

Megan Martinez

Keyword(s):

Time Series ◽

Random Walks ◽

Permutation Patterns ◽

Equal Frequency ◽

Time Intervals ◽

The Past ◽

Data Points ◽

International Audience ◽

Ordinal Patterns ◽

Analysis Of Time Series

International audience In the past decade, the use of ordinal patterns in the analysis of time series and dynamical systems has become an important tool. Ordinal patterns (otherwise known as a permutation patterns) are found in time series by taking $n$ data points at evenly-spaced time intervals and mapping them to a length-$n$ permutation determined by relative ordering. The frequency with which certain patterns occur is a useful statistic for such series. However, the behavior of the frequency of pattern occurrence is unstudied for most models. We look at the frequency of pattern occurrence in random walks in discrete time, and we define a natural equivalence relation on permutations under which equivalent patterns appear with equal frequency, regardless of probability distribution. We characterize these equivalence classes applying combinatorial methods. Au cours de la dernière décennie, l’utilisation des motifs ordinaux dans l’analyse des séries chronologiques et systèmes dynamiques est devenu un outil important. Des motifs ordinaux (autrement appelés motifs de permutations) se trouvent dans les séries chronologiques en prenant $n$ points de données au intervalles de temps uniformément espacées et les faisant correspondre à une permutation de longueur $n$ déterminée par leur ordre relatif. La fréquence avec laquelle certains motifs apparaissent est une statistique utile pour ces séries. Toutefois, le comportement de la fréquence d’apparition de ces motifs n’a pas été étudié pour la plupart des modèles. Nous regardons la fréquence d’occurrence des motifs dans les marches aléatoires en temps discret, et nous définissons une relation d’équivalence naturelle sur des permutations dans laquelle les motifs équivalents apparaissent avec la même fréquence, quelle que soit la distribution de probabilité. Nous caractérisons ces classes d’équivalence utilisant des méthodes combinatoires

Download Full-text

Nonlinear Analysis of Time Series in Genome-Wide Linkage Disequilibrium Data

10.1063/1.2891412 ◽

2008 ◽

Cited By ~ 1

Author(s):

Enrique Hernández-Lemus ◽

Jesús K. Estrada-Gil ◽

Irma Silva-Zolezzi ◽

J. Carlos Fernández-López ◽

Alfredo Hidalgo-Miranda ◽

...

Keyword(s):

Time Series ◽

Linkage Disequilibrium ◽

Nonlinear Analysis ◽

Genome Wide ◽

Genome Wide Linkage Disequilibrium ◽

Analysis Of Time Series

Download Full-text

Bayesian copy number detection and association in large-scale studies

BMC Cancer ◽

10.1186/s12885-020-07304-3 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Stephen Cristiano ◽

David McKean ◽

Jacob Carey ◽

Paige Bracci ◽

Paul Brennan ◽

...

Keyword(s):

Pancreatic Cancer ◽

Copy Number ◽

Large Scale ◽

Association Studies ◽

Regulatory Elements ◽

Bayesian Regression ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Batch Effects ◽

Genome Wide

Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.

Download Full-text

Comparison of Harmonic Analysis of Time Series (HANTS) and Multi-Singular Spectrum Analysis (M-SSA) in Reconstruction of Long-Gap Missing Data in NDVI Time Series

Remote Sensing ◽

10.3390/rs12172747 ◽

2020 ◽

Vol 12 (17) ◽

pp. 2747

Author(s):

Hamid Reza Ghafarian Malamiri ◽

Hadi Zare ◽

Iman Rousta ◽

Haraldur Olafsson ◽

Emma Izquierdo Verdiguier ◽

...

Keyword(s):

Time Series ◽

Missing Data ◽

Harmonic Analysis ◽

Spectrum Analysis ◽

Singular Spectrum Analysis ◽

Vegetation Changes ◽

Singular Spectrum ◽

Ndvi Time Series ◽

Reconstructed Data ◽

Analysis Of Time Series

Monitoring vegetation changes over time is very important in dry areas such as Iran, given its pronounced drought-prone agricultural system. Vegetation indices derived from remotely sensed satellite imageries are successfully used to monitor vegetation changes at various scales. Atmospheric dust as well as airborne particles, particularly gases and clouds, significantly affect the reflection of energy from the surface, especially in visible, short and infrared wavelengths. This results in imageries with missing data (gaps) and outliers while vegetation change analysis requires integrated and complete time series data. This study investigated the performance of HANTS (Harmonic ANalysis of Time Series) algorithm and (M)-SSA ((Multi-channel) Singular Spectrum Analysis) algorithm in reconstruction of wide-gap of missing data. The time series of Normalized Difference Vegetation Index (NDVI) retrieved from Landsat TM in combination with 250m MODIS NDVI time image products are used to simulate and find periodic components of the NDVI time series from 1986 to 2000 and from 2000 to 2015, respectively. This paper presents the evaluation of the performance of gap filling capability of HANTS and M-SSA by filling artificially created gaps in data using Landsat and MODIS data. The results showed that the RMSEs (Root Mean Square Errors) between the original and reconstructed data in HANTS and M-SSA algorithms were 0.027 and 0.023 NDVI value, respectively. Further, RMSEs among 15 NDVI images extracted from the time series artificially and reconstructed by HANTS and M-SSA algorithms were 0.030 and 0.025 NDVI value, respectively. RMSEs of the original and reconstructed data in HANTS and M-SSA algorithms were 0.10 and 0.04 for time series 6, respectively. The findings of this study present a favorable option for solving the missing data challenge in NDVI time series.

Download Full-text

Graphical-statistical method to explore variability of hydrological time series

Hydrology Research ◽

10.2166/nh.2020.111 ◽

2020 ◽

Author(s):

Charles Onyutha

Keyword(s):

Time Series ◽

Large Scale ◽

Climate Change Impacts ◽

Climatic Data ◽

Adaptation To Climate Change ◽

Lack Of Information ◽

Data Points ◽

White Nile ◽

Using Data ◽

Random Fluctuations

Abstract Due to increasing concern on developing measures for predictive adaptation to climate change impacts on hydrology, several studies have tended to be conducted on trends in climatic data. Conventionally, trend analysis comprises testing the null hypothesis H0 (no trend) by applying the Mann–Kendall or Spearman's rho test to the entire time series. This leads to lack of information about hidden short-durational increasing or decreasing trends (hereinafter called sub-trends) in the data. Furthermore, common trend tests are purely statistical in nature and their results can be meaningless sometimes, especially when not supported by graphical exploration of changes in the data. This paper presents a graphical-statistical methodology to identify and separately analyze sub-trends for supporting attribution of hydrological changes. The method is based on cumulative sum of differences between exceedance and non-exceedance counts of data points. Through the method, it is possible to appreciate that climate variability comprises large-scale random fluctuations in terms of rising and falling hydro-climatic sub-trends which can be associated with certain attributes. Illustration on how to apply the introduced methodology was made using data over the White Nile region in Africa. Links for downloading a tool called CSD-VAT to implement the presented methodology were provided.

Download Full-text

Temporal Dynamic Matrix Factorization for Missing Data Prediction in Large Scale Coevolving Time Series

IEEE Access ◽

10.1109/access.2016.2606242 ◽

2016 ◽

Vol 4 ◽

pp. 6719-6732 ◽

Cited By ~ 5

Author(s):

Weiwei Shi ◽

Yongxin Zhu ◽

Philip S. Yu ◽

Tian Huang ◽

Chang Wang ◽

...

Keyword(s):

Time Series ◽

Missing Data ◽

Matrix Factorization ◽

Large Scale ◽

Temporal Dynamic ◽

Dynamic Matrix ◽

Data Prediction ◽

Missing Data Prediction

Download Full-text

THE ROBUST FRACTAL ANALYSIS OF TIME SERIES: CONCERNING SIGNAL CLASS AND DATA LENGTH

Fractals ◽

10.1142/s0218348x11005099 ◽

2011 ◽

Vol 19 (01) ◽

pp. 29-49 ◽

Cited By ~ 4

Author(s):

M. H. FATTAHI ◽

N. TALEBBEYDOKHTI ◽

G. R. RAKHSHANDEHROO ◽

A. SHAMSAI ◽

E. NIKOOEE

Keyword(s):

Time Series ◽

Fractal Analysis ◽

Variation Method ◽

Fluctuation Analysis ◽

Analysis Method ◽

Range Analysis ◽

Rescaled Range Analysis ◽

Data Points ◽

Data Length ◽

Analysis Of Time Series

In the present paper, the influence of the signal class (fBm/fGn) and the data length of time series on choosing the robust fractal analysis method have been studied. More than 1000 fBm/fGn generated time series in short, intermediate and long ranges have been analyzed using common fractal analysis methods. The chosen techniques were power spectral density, detrended fluctuation analysis, rescaled range analysis, box counting, average wavelet coefficients, and the variation method. Numerous graphs indicating the suitability of each method in terms of biases in calculating the fundamental fractal feature of time series, Hurst coefficient, were employed. The results strongly emphasized the crucial influence of the signal class as well as the data length when choosing the appropriate fractal analysis method. Furthermore, as a step forward, a study on the number of data points present in a classified class/length was performed. The effect of the number of data points could not be neglected either. Based on the results, a strategy flowchart for fractal analysis of time series has been proposed. Finally, as an empirical example, the monthly, weekly and daily scaled flow time series of Ghar-e-Aghaj River have been analyzed within the framework of the strategy flowchart.

Download Full-text