Weighted Consensus Segmentations

Halima Saker; Rainer Machné; Jörg Fallmann; Douglas B. Murray; Ahmad M. Shahin; Peter F. Stadler

doi:10.3390/computation9020017

Weighted Consensus Segmentations

Computation ◽

10.3390/computation9020017 ◽

2021 ◽

Vol 9 (2) ◽

pp. 17

Author(s):

Halima Saker ◽

Rainer Machné ◽

Jörg Fallmann ◽

Douglas B. Murray ◽

Ahmad M. Shahin ◽

...

Keyword(s):

Time Series ◽

Language Processing ◽

Growth Curves ◽

Distance Functions ◽

Data Sets ◽

Aggregation Problem ◽

Ordered Data ◽

Polycistronic Transcripts ◽

Sum Of Distances ◽

Segmentation Problem

The problem of segmenting linearly ordered data is frequently encountered in time-series analysis, computational biology, and natural language processing. Segmentations obtained independently from replicate data sets or from the same data with different methods or parameter settings pose the problem of computing an aggregate or consensus segmentation. This Segmentation Aggregation problem amounts to finding a segmentation that minimizes the sum of distances to the input segmentations. It is again a segmentation problem and can be solved by dynamic programming. The aim of this contribution is (1) to gain a better mathematical understanding of the Segmentation Aggregation problem and its solutions and (2) to demonstrate that consensus segmentations have useful applications. Extending previously known results we show that for a large class of distance functions only breakpoints present in at least one input segmentation appear in the consensus segmentation. Furthermore, we derive a bound on the size of consensus segments. As show-case applications, we investigate a yeast transcriptome and show that consensus segments provide a robust means of identifying transcriptomic units. This approach is particularly suited for dense transcriptomes with polycistronic transcripts, operons, or a lack of separation between transcripts. As a second application, we demonstrate that consensus segmentations can be used to robustly identify growth regimes from sets of replicate growth curves.

Download Full-text

Interpretation of the Chemical and Physical Time-Series Retrieved from Sentik Glacier, Ladakh Himalaya, India

Journal of Glaciology ◽

10.3189/s0022143000008509 ◽

1984 ◽

Vol 30 (104) ◽

pp. 66-76 ◽

Cited By ~ 2

Author(s):

Paul A. Mayewski ◽

W. Berry Lyons ◽

N. Ahmad ◽

Gordon Smith ◽

M. Pourchet

Keyword(s):

Time Series ◽

Chemical Species ◽

Data Sets ◽

Reactive Iron ◽

Physical Time ◽

Ladakh Himalaya ◽

Data Density ◽

Mass Circulation ◽

The Himalaya ◽

Analysis Of Time Series

AbstractSpectral analysis of time series of a c. 17 ± 0.3 year core, calibrated for total ß activity recovered from Sentik Glacier (4908m) Ladakh, Himalaya, yields several recognizable periodicities including subannual, annual, and multi-annual. The time-series, include both chemical data (chloride, sodium, reactive iron, reactive silicate, reactive phosphate, ammonium, δD, δ(18O) and pH) and physical data (density, debris and ice-band locations, and microparticles in size grades 0.50 to 12.70 μm). Source areas for chemical species investigated and general air-mass circulation defined from chemical and physical time-series are discussed to demonstrate the potential of such studies in the development of paleometeorological data sets from remote high-alpine glacierized sites such as the Himalaya.

Download Full-text

An edge-cloud collaboration architecture for pattern anomaly detection of time series in wireless sensor networks

Complex & Intelligent Systems ◽

10.1007/s40747-021-00442-6 ◽

2021 ◽

Author(s):

Cong Gao ◽

Ping Yang ◽

Yanping Chen ◽

Zhongmin Wang ◽

Yue Wang

Keyword(s):

Time Series ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Anomaly Detection ◽

Estimation Method ◽

Feature Representation ◽

Sensor Data ◽

Wireless Sensor ◽

Data Sets ◽

Edge Node

AbstractWith large deployment of wireless sensor networks, anomaly detection for sensor data is becoming increasingly important in various fields. As a vital data form of sensor data, time series has three main types of anomaly: point anomaly, pattern anomaly, and sequence anomaly. In production environments, the analysis of pattern anomaly is the most rewarding one. However, the traditional processing model cloud computing is crippled in front of large amount of widely distributed data. This paper presents an edge-cloud collaboration architecture for pattern anomaly detection of time series. A task migration algorithm is developed to alleviate the problem of backlogged detection tasks at edge node. Besides, the detection tasks related to long-term correlation and short-term correlation in time series are allocated to cloud and edge node, respectively. A multi-dimensional feature representation scheme is devised to conduct efficient dimension reduction. Two key components of the feature representation trend identification and feature point extraction are elaborated. Based on the result of feature representation, pattern anomaly detection is performed with an improved kernel density estimation method. Finally, extensive experiments are conducted with synthetic data sets and real-world data sets.

Download Full-text

Learning emotional word embeddings for sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201993 ◽

2021 ◽

pp. 1-13

Author(s):

Qingtian Zeng ◽

Xishi Zhao ◽

Xiaohui Hu ◽

Hua Duan ◽

Zhongying Zhao ◽

...

Keyword(s):

Sentiment Analysis ◽

Language Processing ◽

State Of The Art ◽

Research Problem ◽

Emotional Word ◽

Classification Model ◽

Data Sets ◽

Word Embeddings ◽

Real World Data ◽

Text Documents

Word embeddings have been successfully applied in many natural language processing tasks due to its their effectiveness. However, the state-of-the-art algorithms for learning word representations from large amounts of text documents ignore emotional information, which is a significant research problem that must be addressed. To solve the above problem, we propose an emotional word embedding (EWE) model for sentiment analysis in this paper. This method first applies pre-trained word vectors to represent document features using two different linear weighting methods. Then, the resulting document vectors are input to a classification model and used to train a text sentiment classifier, which is based on a neural network. In this way, the emotional polarity of the text is propagated into the word vectors. The experimental results on three kinds of real-world data sets demonstrate that the proposed EWE model achieves superior performances on text sentiment prediction, text similarity calculation, and word emotional expression tasks compared to other state-of-the-art models.

Download Full-text

A Hypothesis Test for the Goodness-of-Fit of the Marginal Distribution of a Time Series with Application to Stablecoin Data

Engineering Proceedings ◽

10.3390/engproc2021005010 ◽

2021 ◽

Vol 5 (1) ◽

pp. 10

Author(s):

Mark Levene

Keyword(s):

Time Series ◽

Goodness Of Fit ◽

Marginal Distribution ◽

Hypothesis Test ◽

Data Sets ◽

Test Statistic ◽

Sample Test ◽

Kolmogorov Smirnov ◽

Heavy Tailed ◽

Jensen Shannon Divergence

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.

Download Full-text

Three-dimensional deformation time series of glacier motion from multiple-aperture DInSAR observation

Journal of Geodesy ◽

10.1007/s00190-019-01325-y ◽

2019 ◽

Vol 93 (12) ◽

pp. 2651-2660 ◽

Cited By ~ 2

Author(s):

Sergey Samsonov

Keyword(s):

Time Series ◽

Surface Deformation ◽

Ground Deformation ◽

Three Dimensional ◽

Data Sets ◽

Ice Flow ◽

The North ◽

Glacier Ice ◽

Deformation Component ◽

3D Deformation

AbstractThe previously presented Multidimensional Small Baseline Subset (MSBAS-2D) technique computes two-dimensional (2D), east and vertical, ground deformation time series from two or more ascending and descending Differential Interferometric Synthetic Aperture Radar (DInSAR) data sets by assuming that the contribution of the north deformation component is negligible. DInSAR data sets can be acquired with different temporal and spatial resolutions, viewing geometries and wavelengths. The MSBAS-2D technique has previously been used for mapping deformation due to mining, urban development, carbon sequestration, permafrost aggradation and pingo growth, and volcanic activities. In the case of glacier ice flow, the north deformation component is often too large to be negligible. Historically, the surface-parallel flow (SPF) constraint was used to compute the static three-dimensional (3D) velocity field at various glaciers. A novel MSBAS-3D technique has been developed for computing 3D deformation time series where the SPF constraint is utilized. This technique is used for mapping 3D deformation at the Barnes Ice Cap, Baffin Island, Nunavut, Canada, during January–March 2015, and the MSBAS-2D and MSBAS-3D solutions are compared. The MSBAS-3D technique can be used for studying glacier ice flow at other glaciers and other surface deformation processes with large north deformation component, such as landslides. The software implementation of MSBAS-3D technique can be downloaded from http://insar.ca/.

Download Full-text

Mapping site index and age by linking a time series of canopy height models with growth curves

Forest Ecology and Management ◽

10.1016/j.foreco.2008.10.029 ◽

2009 ◽

Vol 257 (3) ◽

pp. 951-959 ◽

Cited By ~ 35

Author(s):

Cédric Véga ◽

Benoît St-Onge

Keyword(s):

Time Series ◽

Growth Curves ◽

Site Index ◽

Canopy Height

Download Full-text

Searching for g modes

Astronomy and Astrophysics ◽

10.1051/0004-6361/201833535 ◽

2018 ◽

Vol 617 ◽

pp. A108 ◽

Cited By ~ 5

Author(s):

T. Appourchaux ◽

P. Boumier ◽

J. W. Leibacher ◽

T. Corbard

Keyword(s):

Time Series ◽

Radial Velocity ◽

Time Shift ◽

Data Sets ◽

Mode Amplitude ◽

The Past ◽

Mode Detection ◽

Velocity Calibration ◽

Remaining Time ◽

Soho Spacecraft

Context. The recent claims of g-mode detection have restarted the search for these potentially extremely important modes. These claims can be reassessed in view of the different data sets available from the SoHO instruments and ground-based instruments. Aims. We produce a new calibration of the GOLF data with a more consistent p-mode amplitude and a more consistent time shift correction compared to the time series used in the past. Methods. The calibration of 22 yr of GOLF data is done with a simpler approach that uses only the predictive radial velocity of the SoHO spacecraft as a reference. Using p modes, we measure and correct the time shift between ground- and space-based instruments and the GOLF instrument. Results. The p-mode velocity calibration is now consistent to within a few percent with other instruments. The remaining time shifts are within ±5 s for 99.8% of the time series.

Download Full-text

Fractional differencing in stock market price and online presence of global tourist corporations

Journal of Economics Finance and Administrative Science ◽

10.1108/jefas-01-2018-0013 ◽

2019 ◽

Vol 24 (48) ◽

pp. 194-204 ◽

Cited By ~ 1

Author(s):

Francisco Flores-Muñoz ◽

Alberto Javier Báez-García ◽

Josué Gutiérrez-Barroso

Keyword(s):

Time Series ◽

Stock Market ◽

Market Price ◽

Google Trends ◽

Data Sets ◽

Market Prices ◽

Content Type ◽

Search Results ◽

Online Presence ◽

Fractional Differencing

Purpose This work aims to explore the behavior of stock market prices according to the autoregressive fractional differencing integrated moving average model. This behavior will be compared with a measure of online presence, search engine results as measured by Google Trends. Design/methodology/approach The study sample is comprised by the companies listed at the STOXX® Global 3000 Travel and Leisure. Google Finance and Yahoo Finance, along with Google Trends, were used, respectively, to obtain the data of stock prices and search results, for a period of five years (October 2012 to October 2017). To guarantee certain comparability between the two data sets, weekly observations were collected, with a total figure of 118 firms, two time series each (price and search results), around 61,000 observations. Findings Relationships between the two data sets are explored, with theoretical implications for the fields of economics, finance and management. Tourist corporations were analyzed owing to their growing economic impact. The estimations are initially consistent with long memory; so, they suggest that both stock market prices and online search trends deserve further exploration for modeling and forecasting. Significant differences owing to country and sector effects are also shown. Originality/value This research contributes in two different ways: it demonstrate the potential of a new tool for the analysis of relevant time series to monitor the behavior of firms and markets, and it suggests several theoretical pathways for further research in the specific topics of asymmetry of information and corporate transparency, proposing pertinent bridges between the two fields.

Download Full-text

Evaluation of multiple forcing data sets for precipitation and shortwave radiation over major land areas of China

Hydrology and Earth System Sciences ◽

10.5194/hess-21-5805-2017 ◽

2017 ◽

Vol 21 (11) ◽

pp. 5805-5821 ◽

Cited By ~ 27

Author(s):

Fan Yang ◽

Hui Lu ◽

Kun Yang ◽

Jie He ◽

Wei Wang ◽

...

Keyword(s):

Time Series ◽

Temporal Variations ◽

Spatiotemporal Variability ◽

Shortwave Radiation ◽

Data Sets ◽

Precipitation Data ◽

Land Data Assimilation ◽

Land Data Assimilation System ◽

Data Assimilation System ◽

Assimilation System

Abstract. Precipitation and shortwave radiation play important roles in climatic, hydrological and biogeochemical cycles. Several global and regional forcing data sets currently provide historical estimates of these two variables over China, including the Global Land Data Assimilation System (GLDAS), the China Meteorological Administration (CMA) Land Data Assimilation System (CLDAS) and the China Meteorological Forcing Dataset (CMFD). The CN05.1 precipitation data set, a gridded analysis based on CMA gauge observations, also provides high-resolution historical precipitation data for China. In this study, we present an intercomparison of precipitation and shortwave radiation data from CN05.1, CMFD, CLDAS and GLDAS during 2008–2014. We also validate all four data sets against independent ground station observations. All four forcing data sets capture the spatial distribution of precipitation over major land areas of China, although CLDAS indicates smaller annual-mean precipitation amounts than CN05.1, CMFD or GLDAS. Time series of precipitation anomalies are largely consistent among the data sets, except for a sudden decrease in CMFD after August 2014. All forcing data indicate greater temporal variations relative to the mean in dry regions than in wet regions. Validation against independent precipitation observations provided by the Ministry of Water Resources (MWR) in the middle and lower reaches of the Yangtze River indicates that CLDAS provides the most realistic estimates of spatiotemporal variability in precipitation in this region. CMFD also performs well with respect to annual mean precipitation, while GLDAS fails to accurately capture much of the spatiotemporal variability and CN05.1 contains significant high biases relative to the MWR observations. Estimates of shortwave radiation from CMFD are largely consistent with station observations, while CLDAS and GLDAS greatly overestimate shortwave radiation. All three forcing data sets capture the key features of the spatial distribution, but estimates from CLDAS and GLDAS are systematically higher than those from CMFD over most of mainland China. Based on our evaluation metrics, CLDAS slightly outperforms GLDAS. CLDAS is also closer than GLDAS to CMFD with respect to temporal variations in shortwave radiation anomalies, with substantial differences among the time series. Differences in temporal variations are especially pronounced south of 34° N. Our findings provide valuable guidance for a variety of stakeholders, including land-surface modelers and data providers.

Download Full-text

Construction of merged satellite total O<sub>3</sub> and NO<sub>2</sub> time series in the tropics for trend studies and evaluation by comparison to NDACC SAOZ measurements

Atmospheric Measurement Techniques ◽

10.5194/amt-7-3337-2014 ◽

2014 ◽

Vol 7 (10) ◽

pp. 3337-3354 ◽

Cited By ~ 7

Author(s):

M. Pastel ◽

J.-P. Pommereau ◽

F. Goutail ◽

A. Richter ◽

A. Pazmiño ◽

...

Keyword(s):

Time Series ◽

Lower Stratosphere ◽

Atmospheric Composition ◽

Data Sets ◽

Composition Change ◽

Long Time ◽

Uv Visible ◽

The Tropics ◽

Lower Noise ◽

Total Column

Abstract. Long time series of ozone and NO2 total column measurements in the southern tropics are available from two ground-based SAOZ (Système d'Analyse par Observation Zénithale) UV-visible spectrometers operated within the Network for the Detection of Atmospheric Composition Change (NDACC) in Bauru (22° S, 49° W) in S-E Brazil since 1995 and Reunion Island (21° S, 55° E) in the S-W Indian Ocean since 1993. Although the stations are located at the same latitude, significant differences are observed in the columns of both species, attributed to differences in tropospheric content and equivalent latitude in the lower stratosphere. These data are used to identify which satellites operating during the same period, are capturing the same features and are thus best suited for building reliable merged time series for trend studies. For ozone, the satellites series best matching SAOZ observations are EP-TOMS (1995–2004) and OMI-TOMS (2005–2011), whereas for NO2, best results are obtained by combining GOME version GDP5 (1996–2003) and SCIAMACHY – IUP (2003–2011), displaying lower noise and seasonality in reference to SAOZ. Both merged data sets are fully consistent with the larger columns of the two species above South America and the seasonality of the differences between the two stations, reported by SAOZ, providing reliable time series for further trend analyses and identification of sources of interannual variability in the future analysis.

Download Full-text