On Similarity Measures for Stochastic and Statistical Modeling

Konstantinos Makris; Ilia Vonta; Alex Karagrigoriou

doi:10.3390/math9080840

On Similarity Measures for Stochastic and Statistical Modeling

Mathematics ◽

10.3390/math9080840 ◽

2021 ◽

Vol 9 (8) ◽

pp. 840

Author(s):

Konstantinos Makris ◽

Ilia Vonta ◽

Alex Karagrigoriou

Keyword(s):

Time Series ◽

Statistical Modeling ◽

Similarity Measures ◽

Epidemiological Data ◽

Time Dependent ◽

Dependent Data ◽

Data Sets ◽

Data Set ◽

Time Points ◽

Statistical Functions

In this work, our goal is to present and discuss similarity techniques for ordered observations between time series and non-time dependent data. The purpose of the study was to measure whether ordered observations of data sets are displayed at or close to, the same time points for the case of time series and with the same or similar frequencies for the case of non-time dependent data sets. A simultaneous time pairing and comparison can be achieved effectively via indices, advanced indices and the associated index matrices based on statistical functions of ordered observations. Hence, in this work we review some previously defined standard indices and propose new advanced dimensionless indices and the associated index matrices which are both easily interpreted and provide efficient comparison of the series involved. Furthermore, the proposed methodology allows the analysis of data with different units of measurement as the indices presented are dimensionless. The applicability of the proposed methodology is explored through an epidemiological data set on influenza-like-illness (ILI). We finally provide a thorough discussion on all parameters involved in the proposed indices for practical purposes along with examples.

Download Full-text

Presentation of Coupling Analysis Techniques of Maximum and Minimum Values Between N Sets of Data Using Matrix [µ][MKN]

International Journal of Mathematical Engineering and Management Sciences ◽

10.33889/ijmems.2021.6.4.067 ◽

2021 ◽

Vol 6 (4) ◽

pp. 1127-1136

Author(s):

K. N. Makris ◽

I. Vonta

Keyword(s):

Time Series ◽

Indirect Method ◽

Direct Method ◽

Time Dependent ◽

Data Sets ◽

Data Display ◽

Coupling Analysis ◽

Time Points ◽

Analysis Techniques ◽

Coupling Techniques

This paper deals with the presentation and study of alternative coupling techniques for maximum and minimum values between data sets, namely the problem which is examined in this work is the possible appearance of maximum or minimum values between data sets in the same or neighboring time points. The data can be time-dependent (time series) or non-time-dependent. In this work, the analysis is focused on time series and novel indices are defined in order to measure whether the values of N sets of data display in terms of time, the maximum or minimum values at the same instances or at very close instances. For this purpose, two methods will be compared, one direct method and one indirect method. The indirect method is based on Matrices of dimensionless indicators which are denoted by [μ][MKN], and the direct method is based on a variance-type measure which is denoted by [V][MKN].

Download Full-text

On the Characterization and Forecasting of Ground Displacements of Ocean-Reclaimed Lands

Remote Sensing ◽

10.3390/rs12182971 ◽

2020 ◽

Vol 12 (18) ◽

pp. 2971

Author(s):

Jingzhao Ding ◽

Qing Zhao ◽

Maochuan Tang ◽

Fabiana Calò ◽

Virginia Zamparelli ◽

...

Keyword(s):

Time Series ◽

Ground Deformation ◽

Proper Time ◽

Global Climate ◽

Time Dependent ◽

Ground Displacement ◽

Data Set ◽

Future Evolution ◽

Adopted Model ◽

Sar Data

In this work, we study ground deformation of ocean-reclaimed platforms as retrieved from interferometric synthetic aperture radar (InSAR) analyses. We investigate, in particular, the suitability and accuracy of some time-dependent models used to characterize and foresee the present and future evolution of ground deformation of the coastal lands. Previous investigations, carried out by the authors of this paper and other scholars, related to the zone of the ocean-reclaimed lands of Shanghai, have already shown that ocean-reclaimed lands are subject to subside (i.e., the ground is subject to settling down due to soil consolidation and compression), and the temporal evolution of that deformation follows a certain predictable model. Specifically, two time-gapped SAR datasets composed of the images collected by the ENVISAT ASAR (ENV) from 2007 to 2010 and the COSMO-SkyMed (CSK) sensors, available from 2013 to 2016, were used to generate long-term ground displacement time-series using a proper time-dependent geotechnical model. In this work, we use a third SAR data set consisting of Radarsat-2 (RST-2) acquisitions collected from 2012 to 2016 to further corroborate the validity of that model. As a result, we verified with the new RST-2 data, partially covering the gap between the ENV and CSK acquisitions, that the adopted model fits the data and that the model is suitable to perform future projections. Furthermore, we extended these analyses to the area of Pearl River Delta (PRD) and the city of Shenzhen, China. Our study aims to investigate the suitability of different time-dependent ground deformation models relying on the different geophysical conditions in the two areas of Shanghai and Shenzhen, China. To this aim, three sets of SAR data, collected by the ENV platform (from both ascending and descending orbits) and the Sentinel-1A (S1A) sensor (on ascending orbits), were used to obtain the ground displacement time-series of the Shenzhen city and its surrounding region. Multi-orbit InSAR data products were also combined to discriminate the up–down (subsidence) ground deformation time-series of the coherent points, which are then used to estimate the parameters of the models adopted to foresee the future evolution of the land-reclaimed ground consolidation procedure. The exploitation of the obtained geospatial data and products are helpful for the continuous monitoring of coastal environments and the evaluation of the socio-economical impacts of human activities and global climate change.

Download Full-text

Loi and Gong Low-Degree Rotational Splittings

Symposium - International Astronomical Union ◽

10.1017/s0074180900238515 ◽

1998 ◽

Vol 185 ◽

pp. 167-168

Author(s):

T. Appourchaux ◽

M.C. Rabello-Soares ◽

L. Gizon

Keyword(s):

Time Series ◽

The Other ◽

Data Sets ◽

Data Set ◽

Low Degree ◽

Fourier Spectra

Two different data sets have been used to derive low-degree rotational splittings. One data set comes from the Luminosity Oscillations Imager of VIRGO on board SOHO; the observation starts on 27 March 96 and ends on 26 March 97, and are made of intensity time series of 12 pixels (Appourchaux et al, 1997, Sol. Phys., 170, 27). The other data set was kindly made available by the GONG project; the observation starts on 26 August 1995 and ends on 21 August 1996, and are made of complex Fourier spectra of velocity time series for l = 0 − 9. For the GONG data, the contamination of l = 1 from the spatial aliases of l = 6 and l = 9 required some cleaning. To achieve this, we applied the inverse of the leakage matrix of l = 1, 6 and 9 to the original Fourier spectra of the same degrees; cleaning of all 3 degrees was achieved simultaneously (Appourchaux and Gizon, 1997, these proceedings).

Download Full-text

An assessment of Bayesian bias estimator for numerical weather prediction

Nonlinear Processes in Geophysics ◽

10.5194/npg-15-1013-2008 ◽

2008 ◽

Vol 15 (6) ◽

pp. 1013-1022 ◽

Cited By ~ 2

Author(s):

J. Son ◽

D. Hou ◽

Z. Toth

Keyword(s):

Time Series ◽

Numerical Weather Prediction ◽

Sampling Error ◽

Weather Prediction ◽

Training Data ◽

Statistical Characteristics ◽

Forecast Errors ◽

Data Sets ◽

Data Set ◽

Numerical Weather

Abstract. Various statistical methods are used to process operational Numerical Weather Prediction (NWP) products with the aim of reducing forecast errors and they often require sufficiently large training data sets. Generating such a hindcast data set for this purpose can be costly and a well designed algorithm should be able to reduce the required size of these data sets. This issue is investigated with the relatively simple case of bias correction, by comparing a Bayesian algorithm of bias estimation with the conventionally used empirical method. As available forecast data sets are not large enough for a comprehensive test, synthetically generated time series representing the analysis (truth) and forecast are used to increase the sample size. Since these synthetic time series retained the statistical characteristics of the observations and operational NWP model output, the results of this study can be extended to real observation and forecasts and this is confirmed by a preliminary test with real data. By using the climatological mean and standard deviation of the meteorological variable in consideration and the statistical relationship between the forecast and the analysis, the Bayesian bias estimator outperforms the empirical approach in terms of the accuracy of the estimated bias, and it can reduce the required size of the training sample by a factor of 3. This advantage of the Bayesian approach is due to the fact that it is less liable to the sampling error in consecutive sampling. These results suggest that a carefully designed statistical procedure may reduce the need for the costly generation of large hindcast datasets.

Download Full-text

Two decades of satellite observations of AOD over mainland China using ATSR-2, AATSR and MODIS/Terra: data set evaluation and large-scale patterns

Atmospheric Chemistry and Physics ◽

10.5194/acp-18-1573-2018 ◽

2018 ◽

Vol 18 (3) ◽

pp. 1573-1592 ◽

Cited By ~ 48

Author(s):

Gerrit de Leeuw ◽

Larisa Sogacheva ◽

Edith Rodriguez ◽

Konstantinos Kourtidis ◽

Aristeidis K. Georgoulias ◽

...

Keyword(s):

Time Series ◽

European Space Agency ◽

Mainland China ◽

Satellite Observations ◽

Data Sets ◽

Data Set ◽

Modis Aod ◽

The Difference ◽

Along Track ◽

Aerosol Properties

Abstract. The retrieval of aerosol properties from satellite observations provides their spatial distribution over a wide area in cloud-free conditions. As such, they complement ground-based measurements by providing information over sparsely instrumented areas, albeit that significant differences may exist in both the type of information obtained and the temporal information from satellite and ground-based observations. In this paper, information from different types of satellite-based instruments is used to provide a 3-D climatology of aerosol properties over mainland China, i.e., vertical profiles of extinction coefficients from the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP), a lidar flying aboard the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO) satellite and the column-integrated extinction (aerosol optical depth – AOD) available from three radiometers: the European Space Agency (ESA)'s Along-Track Scanning Radiometer version 2 (ATSR-2), Advanced Along-Track Scanning Radiometer (AATSR) (together referred to as ATSR) and NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) aboard the Terra satellite, together spanning the period 1995–2015. AOD data are retrieved from ATSR using the ATSR dual view (ADV) v2.31 algorithm, while for MODIS Collection 6 (C6) the AOD data set is used that was obtained from merging the AODs obtained from the dark target (DT) and deep blue (DB) algorithms, further referred to as the DTDB merged AOD product. These data sets are validated and differences are compared using Aerosol Robotic Network (AERONET) version 2 L2.0 AOD data as reference. The results show that, over China, ATSR slightly underestimates the AOD and MODIS slightly overestimates the AOD. Consequently, ATSR AOD is overall lower than that from MODIS, and the difference increases with increasing AOD. The comparison also shows that neither of the ATSR and MODIS AOD data sets is better than the other one everywhere. However, ATSR ADV has limitations over bright surfaces which the MODIS DB was designed for. To allow for comparison of MODIS C6 results with previous analyses where MODIS Collection 5.1 (C5.1) data were used, also the difference between the C6 and C5.1 merged DTDB data sets from MODIS/Terra over China is briefly discussed. The AOD data sets show strong seasonal differences and the seasonal features vary with latitude and longitude across China. Two-decadal AOD time series, averaged over all of mainland China, are presented and briefly discussed. Using the 17 years of ATSR data as the basis and MODIS/Terra to follow the temporal evolution in recent years when the environmental satellite Envisat was lost requires a comparison of the data sets for the overlapping period to show their complementarity. ATSR precedes the MODIS time series between 1995 and 2000 and shows a distinct increase in the AOD over this period. The two data series show similar variations during the overlapping period between 2000 and 2011, with minima and maxima in the same years. MODIS extends this time series beyond the end of the Envisat period in 2012, showing decreasing AOD.

Download Full-text

Uncertainty of Climatol adjustment algorithm for daily time series of additive climate variables

10.5194/egusphere-egu2020-5365 ◽

2020 ◽

Author(s):

Oleg Skrynyk ◽

Enric Aguilar ◽

José A. Guijarro ◽

Sergiy Bubin

Keyword(s):

Time Series ◽

Climate Model ◽

Data Sets ◽

Climate Variables ◽

The European Union ◽

Data Set ◽

Raw Data ◽

Climate Signal ◽

Daily Time Series ◽

Daily Time

Before using climatological time series in research studies, it is necessary to perform their quality control and homogenization in order to remove possible artefacts (inhomogeneities) usually present in the raw data sets. In the vast majority of cases, the homogenization procedure allows to improve the consistency of the data, which then can be verified by means of the statistical comparison of the raw and homogenized time series. However, a new question then arises: how far are the homogenized data from the true climate signal or, in other words, what errors could still be present in homogenized data?The main objective of our work is to estimate the uncertainty produced by the adjustment algorithm of the widely used Climatol homogenization software when homogenizing daily time series of the additive climate variables. We focused our efforts on the minimum and maximum air temperature. In order to achieve our goal we used a benchmark data set created by the INDECIS* project. The benchmark contains clean data, extracted from an output of the Royal Netherlands Meteorological Institute Regional Atmospheric Climate Model (version 2) driven by Hadley Global Environment Model 2 - Earth System, and inhomogeneous data, created by introducing realistic breaks and errors.The statistical evaluation of discrepancies between the homogenized (by means of Climatol with predefined break points) and clean data sets was performed using both a set of standard parameters and a metrics introduced in our work. All metrics used clearly identifies the main features of errors (systematic and random) present in the homogenized time series. We calculated the metrics for every time series (only over adjusted segments) as well as their averaged values as measures of uncertainties in the whole data set.In order to determine how the two key parameters of the raw data collection, namely the length of time series and station density, influence the calculated measures of the adjustment error we gradually decreased the length of the period and number of stations in the area under study. The total number of cases considered was 56, including 7 time periods (1950-2005, 1954-2005, &#8230;, 1974-2005) and 8 different quantities of stations (100, 90, &#8230;, 30). Additionally, in order to find out how stable are the calculated metrics for each of the 56 cases and determine their confidence intervals we performed 100 random permutations in the introduced inhomogeneity time series and repeated our calculations With that the total number of homogenization exercises performed was 5600 for each of two climate variables.Lastly, the calculated metrics were compared with the corresponding values, obtained for raw time series. The comparison showed some substantial improvement of the metric values after homogenization in each of the 56 cases considered (for the both variables).-------------------*INDECIS is a part of ERA4CS, an ERA-NET initiated by JPI Climate, and funded by FORMAS (SE), DLR (DE), BMWFW (AT), IFD (DK), MINECO (ES), ANR (FR) with co-funding by the European Union (Grant 690462). The work has been partially supported by the Ministry of Education and Science of Kazakhstan (Grant BR05236454) and Nazarbayev University (Grant 090118FD5345).

Download Full-text

A Growth Model for Multilevel Ordinal Data

Journal of Educational and Behavioral Statistics ◽

10.3102/10769986030004369 ◽

2005 ◽

Vol 30 (4) ◽

pp. 369-396 ◽

Cited By ~ 8

Author(s):

Eisuke Segawa

Keyword(s):

Latent Variable ◽

Ordinal Data ◽

Linear Models ◽

Growth Models ◽

Simulated Data ◽

Real Data ◽

Analytic Structure ◽

Data Sets ◽

Data Set ◽

Time Points

Multi-indicator growth models were formulated as special three-level hierarchical generalized linear models to analyze growth of a trait latent variable measured by ordinal items. Items are nested within a time-point, and time-points are nested within subject. These models are special because they include factor analytic structure. This model can analyze not only data with item- and time-level missing observations, but also data with time points freely specified over subjects. Furthermore, features useful for longitudinal analyses, “autoregressive error degree one” structure for the trait residuals and estimated time-scores, were included. The approach is Bayesian with Markov Chain and Monte Carlo, and the model is implemented in WinBUGS. They are illustrated with two simulated data sets and one real data set with planned missing items within a scale.

Download Full-text

First intercalibration of column-averaged methane from the Total Carbon Column Observing Network and the Network for the Detection of Atmospheric Composition Change

Atmospheric Measurement Techniques Discussions ◽

10.5194/amtd-5-1355-2012 ◽

2012 ◽

Vol 5 (1) ◽

pp. 1355-1379

Author(s):

F. Forster ◽

R. Sussmann ◽

M. Rettinger ◽

N. M. Deutscher ◽

D. W. T. Griffith ◽

...

Keyword(s):

Time Series ◽

Near Infrared ◽

Calibration Factor ◽

Time Dependent ◽

Total Carbon ◽

Atmospheric Composition ◽

Composition Change ◽

Data Set ◽

The Difference ◽

Difference Time

Abstract. We present the intercalibration of dry-air column-averaged mole fractions of methane (XCH4) retrieved from solar FTIR measurements of the Network for the Detection of Atmospheric Composition Change (NDACC) in the mid-infrared (MIR) versus near-infrared (NIR) soundings from the Total Carbon Column Observing Network (TCCON). The study uses multi-annual quasi-coincident MIR and NIR measurements from the stations Garmisch, Germany (47.48° N, 11.06° E, 743 m a.s.l.) and Wollongong, Australia (34.41° S, 150.88° E, 30 m a.s.l.). Direct comparison of the retrieved MIR and NIR time series shows a phase shift in XCH4 seasonality, i.e. a significant time-dependent bias leading to a standard deviation (stdv) of the difference time series (NIR-MIR) of 8.4 ppb. After eliminating differences in a prioris by using ACTM-simulated profiles as a common prior, the seasonalities of the (corrected) MIR and NIR time series agree within the noise (stdv = 5.2 ppb for the difference time series). The difference time series (NIR-MIR) do not show a significant trend. Therefore it is possible to use a simple scaling factor for the intercalibration without a time-dependent linear or seasonal component. Using the Garmisch and Wollongong data together, we obtain an overall calibration factor MIR/NIR = 0.9926(18). The individual calibration factors per station are 0.9940(14) for Garmisch and 0.9893(40) for Wollongong. They agree within their error bars with the overall calibration factor which can therefore be used for both stations. Our results suggest that after applying the proposed intercalibration concept to all stations performing both NIR and MIR measurements, it should be possible to obtain one refined overall intercalibration factor for the two networks. This would allow to set up a harmonized NDACC and TCCON XCH4 data set which can be exploited for joint trend studies, satellite validation, or the inverse modeling of sources and sinks.

Download Full-text

An end-to-end Novel Forecasting Model for Crime Prediction based on Big Data

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9153.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 3704-3708

Keyword(s):

Time Series ◽

Big Data ◽

Data Analytics ◽

Linear Time ◽

Big Data Analytics ◽

Series Data ◽

Data Sets ◽

Data Set ◽

Main Category ◽

Crime Prediction

Big data analytics is a field in which we analyse and process information from large or convoluted data sets to be managed by methods of data-processing. Big data analytics is used in analysing the data and helps in predicting the best outcome from the data sets. Big data analytics can be very useful in predicting crime and also gives the best possible solution to solve that crime. In this system we will be using the past crime data set to find out the pattern and through that pattern we will be predicting the range of the incident. The range of the incident will be determined by the decision model and according to the range the prediction will be made. The data sets will be nonlinear and in the form of time series so in this system we will be using the prophet model algorithm which is used to analyse the non-linear time series data. The prophet model categories in three main category and i.e. trends, seasonality, and holidays. This system will help crime cell to predict the possible incident according to the pattern which will be developed by the algorithm and it also helps to deploy right number of resources to the highly marked area where there is a high chance of incidents to occur. The system will enhance the crime prediction system and will help the crime department to use their resources more efficiently.

Download Full-text

The SPARC water vapour assessment II: Comparison of stratospheric and lower mesospheric water vapour time series observed from satellites

10.5194/amt-2018-33 ◽

2018 ◽

Author(s):

Farahnaz Khosrawi ◽

Stefan Lossow ◽

Gabriele P. Stiller ◽

Karen H. Rosenlof ◽

Joachim Urban ◽

...

Keyword(s):

Time Series ◽

Water Vapour ◽

Data Sets ◽

Data Set ◽

The Future ◽

Modelling Studies ◽

The Difference ◽

The Tropics ◽

The Antarctic ◽

Satellite Instruments

Abstract. Time series of stratospheric and lower mesospheric water vapour using 33 data sets from 15 different satellite instruments were compared in the framework of the second SPARC (Stratosphere-troposphere Processes And their Role in Climate) water vapour assessment (WAVAS-II). This comparison aimed to provide a comprehensive overview of the typical uncertainties in the observational database that can be considered in the future in observational and modelling studies addressing e.g stratospheric water vapour trends. The time series comparisons are presented for the three latitude bands, the Antarctic (80°–70° S), the tropics (15° S–15° N) and the northern hemisphere mid-latitudes (50° N–60° N) at four different altitudes (0.1, 3, 10 and 80 hPa) covering the stratosphere and lower mesosphere. The combined temporal coverage of observations from the 15 satellite instruments allowed considering the time period 1986–2014. In addition to the qualitative comparison of the time series, the agreement of the data sets is assessed quantitatively in the form of the spread (i.e. the difference between the maximum and minimum volume mixing ratio among the data sets), the (Pearson) correlation coefficient and the drift (i.e. linear changes of the difference between time series over time). Generally, good agreement between the time series was found in the middle stratosphere while larger differences were found in the lower mesosphere and near the tropopause. Concerning the latitude bands, the largest differences were found in the Antarctic while the best agreement was found for the tropics. From our assessment we find that all data sets can be considered in the future in observational and modelling studies addressing e.g. stratospheric and lower mesospheric water vapour variability and trends when data set specific characteristics (e.g. a drift) and restrictions (e.g. temporal and spatial coverage) are taken into account.

Download Full-text