scholarly journals Cluster Analysis and Model Comparison Using Smart Meter Data

Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3157
Author(s):  
Muhammad Arslan Shaukat ◽  
Haafizah Rameeza Shaukat ◽  
Zakria Qadir ◽  
Hafiz Suliman Munawar ◽  
Abbas Z. Kouzani ◽  
...  

Load forecasting plays a crucial role in the world of smart grids. It governs many aspects of the smart grid and smart meter, such as demand response, asset management, investment, and future direction. This paper proposes time-series forecasting for short-term load prediction to unveil the load forecast benefits through different statistical and mathematical models, such as artificial neural networks, auto-regression, and ARIMA. It targets the problem of excessive computational load when dealing with time-series data. It also presents a business case that is used to analyze different clusters to find underlying factors of load consumption and predict the behavior of customers based on different parameters. On evaluating the accuracy of the prediction models, it is observed that ARIMA models with the (P, D, Q) values as (1, 1, 1) were most accurate compared to other values.

Agromet ◽  
2007 ◽  
Vol 21 (2) ◽  
pp. 46 ◽  
Author(s):  
W. Estiningtyas ◽  
F. Ramadhani ◽  
E. Aldrian

<p>Significant decrease in rainfall caused extreme climate has significant impact on agriculture sector, especialy food crops production. It is one of reason and push developing of rainfall prediction models as anticipate from extreme climate events. Rainfall prediction models develop base on time series data, and then it has been included anomaly aspect, like rainfall prediction model with Kalman filtering method. One of global parameter that has been used as climate anomaly indicator is sea surface temperature. Some of research indicate, there are relationship between sea surface temperature and rainfall. Relationship between Indonesian rainfall and global sea surface temperature has been known, but its relationship with Indonesian’s sea surface temperature not know yet, especialy for rainfall in smaller area like district. So, therefore the research about relationship between rainfall in distric area and Indonesian’s sea surface temperature and it application for rainfall prediction is needed. Based on Indonesian’s sea surface temperature time series data Januari 1982 until Mei 2006 show there are zona of Indonesian’s sea surface temperature (with temperature more than 27,6 0C) dominan in Januari-Mei and moved with specific pattern. Highest value of spasial correlation beetwen Cilacap’s rainfall and Indonesian’s sea surface temperature is 0,30 until 0,50 with different zona of Indonesian’s sea surface temperature. Highest positive correlation happened in March and July. Negative correlation is -0,30 until -0,70 with highest negative correlation in May and June. Model validation resulted correlation coeffcient 85,73%, fits model 20,74%, r2 73,49%, RMSE 20,5% and standart deviation 37,96. Rainfall prediction Januari-Desember 2007 period indicated rainfall pattern is near same with average rainfall pattern, rainfall less than 100/month. The result of this research indicate Indonesian’s sea surface temperature can be used as indicator rainfall condition in distric area, that means rainfall in district area can be predicted based on Indonesian’s sea surface temperature in zona with highest correlation in every month.</p><p>------------------------------------------------------------------</p><p>Penurunan curah hujan yang cukup signifikan akibat iklim ekstrim telah membawa dampak yang cukup signifikan pula pada sektor pertanian, terutama produksi tanaman pangan. Hal ini menjadi salah satu alasan yang mendorong semakin berkembangnya model-model prakiraan hujan sebagai upaya antipasi terhadap kejadian iklim ekstrim. Model prakiraan hujan yang pada awalnya hanya berbasis pada data time series, kini telah berkembang dengan memperhitungkan aspek anomali iklim, seperti model prakiraan hujan dengan metode filter Kalman. Salah satu indikator global yang dapat digunakan sebagai indikator anomali iklim adalah suhu permukaan laut. Dari berbagai hasil penelitian diketahui bahwa suhu permukaan laut ini memiliki keterkaitan dengan kejadian curah hujan. Hubungan curah hujan Indonesia dengan suhu permukaan laut global sudah banyak diketahui, tetapi keterkaitannya dengan suhu permukaan laut wilayah Indonesia belum banyak mendapat perhatian, terutama untuk curah hujan pada cakupan yang lebih sempit seperti kabupaten. Oleh karena itu perlu dilakukan penelitian yang mengkaji hubungan kedua parameter tersebut serta mengaplikasikannya untuk prakiraan curah hujan pada wilayah Kabupaten. Hasil penelitian berdasarkan data suhu permukaan laut wilayah Indonesia rata-rata Januari 1982 hingga Mei 2006 menunjukkan zona dengan suhu lebih dari 27,6 0C yang dominan pada bulan Januari-Mei dan bergerak dengan pola yang cukup jelas. Korelasi spasial antara curah hujan kabupaten Cilacap dengan SPL wilayah Indonesia rata-rata bulan Januari-Desember menunjukkan korelasi positip tertinggi antara 0,30 hingga 0,50 dengan zona SPL yang beragam. Korelasi tertinggi terjadi pada bulan Maret dan Juli. Sedangkan korelasi negatip berkisar antara -0,30 hingga -0,70 dengan korelasi negatip tertinggi pada bulan Mei dan Juni. Validasi model prakiraan hujan menghasilkan nilai koefisien korelasi 85,73%, fits model 20,74%, r2 sebesar 73,49%, RMSE 20,5% dan standar deviasi 37,96. Hasil prakiraan hujan bulanan periode Januari-Desember 2007 mengindikasikan pola curah hujan yang tidak jauh berbeda dengan rata-rata selama 19 tahun (1988-2006) dengan jeluk hujan kurang dari 100 mm/bulan. Hasil penelitian mengindikasikan bahwa SPL wilayah Indonesia dapat digunakan sebagai indikator untuk menunjukkan kondisi curah hujan di suatu wilayah (kabupaten), artinya curah hujan dapat diprediksi berdasarkan perubahan SPL pada zona-zona dengan korelasi yang tertinggi pada setiap bulannya.</p>


2019 ◽  
Author(s):  
Joseph R. Mihaljevic ◽  
Amy L. Greer ◽  
Jesse L. Brunner

AbstractMechanistic models are critical for our understanding of both within-host dynamics (i.e., pathogen population growth and immune system processes) and among-host dynamics (i.e., transmission). Rarely, however, have within-host models been synthesized with data to infer processes, validate hypotheses, or generate new theories. In this study we use mechanistic models and empirical, time-series data of viral titer to better understand the growth of ranaviruses within their amphibian hosts and the immune dynamics that limit viral replication. Specifically, we fit a suite of potential models to our data, where each model represents a hypothesis about the interactions between viral growth and immune defense. Through formal model comparison, we find a parsimonious model that captures key features of our time-series data: the viral titer rises and falls through time, likely due to an immune system response, and that the initial viral dosage affects both the peak viral titer and the timing of the peak. Importantly, our model makes several predictions, including the existence of long-term viral infections, that can be validated in future studies.


2020 ◽  
Author(s):  
Sina Faizollahzadeh Ardabili ◽  
Amir Mosavi ◽  
Shahab Band ◽  
Annamaria R. Varkonyi-Koczy

An accurate outbreak prediction of COVID-19 can successfully help to get insight into the spread and consequences of infectious diseases. Recently, machine learning (ML) based prediction models have been successfully employed for the prediction of the disease outbreak. The present study aimed to engage an artificial neural network-integrated by grey wolf optimizer for COVID-19 outbreak predictions by employing the Global dataset. Training and testing processes have been performed by time-series data related to January 22 to September 15, 2020 and validation has been performed by time-series data related to September 16 to October 15, 2020. Results have been evaluated by employing mean absolute percentage error (MAPE) and correlation coefficient (r) values. ANN-GWO provided a MAPE of 6.23, 13.15 and 11.4% for training, testing and validating phases, respectively. According to the results, the developed model could successfully cope with the prediction task.


2021 ◽  
Vol 14 (1) ◽  
pp. 140
Author(s):  
Johann Desloires ◽  
Dino Ienco ◽  
Antoine Botrel ◽  
Nicolas Ranc

Applications in which researchers aim to extract a single land type from remotely sensed data are quite common in practical scenarios: extract the urban footprint to make connections with socio-economic factors; map the forest extent to subsequently retrieve biophysical variables and detect a particular crop type to successively calibrate and deploy yield prediction models. In this scenario, the (positive) targeted class is well defined, while the negative class is difficult to describe. This one-class classification setting is also referred to as positive unlabelled learning (PUL) in the general field of machine learning. To deal with this challenging setting, when satellite image time series data are available, we propose a new framework named positive and unlabelled learning of satellite image time series (PUL-SITS). PUL-SITS involves two different stages: In the first one, a recurrent neural network autoencoder is trained to reconstruct only positive samples with the aim to higight reliable negative ones. In the second stage, both labelled and unlabelled samples are exploited in a semi-supervised manner to build the final binary classification model. To assess the quality of our approach, experiments were carried out on a real-world benchmark, namely Haute-Garonne, located in the southwest area of France. From this study site, we considered two different scenarios: a first one in which the process has the objective to map Cereals/Oilseeds cover versus the rest of the land cover classes and a second one in which the class of interest is the Forest land cover. The evaluation was carried out by comparing the proposed approach with recent competitors to deal with the considered positive and unlabelled learning scenarios.


2019 ◽  
Author(s):  
Aaron Jason Fisher ◽  
Peter D. Soyster

The present study sought to apply statistical classification methods to idiographic time series data in order to make accurate future predictions of behavior. We recruited 70 individuals who presented as regular smokers; 52 completed experience sampling method (ESM) data collection and provided sufficient time series data. Time stamps from ESM surveys were used to calculate the time of day, day of the week, and continuous time—where the last datum was, in turn, used to calculate 12-hr and 24-hr cycles. Each individual’s time series was split into sequential training and testing sections, so that trained models could be tested on future observations. Prediction models were trained on the first 75% of the individual’s data and tested on the last 25%. Predictions of future behavior were made on a person by person basis. Two prediction algorithms were employed, elastic net regularization and naïve Bayes classification. Sample-wide area under the curve was nearly 80%, with some models demonstrating perfect prediction accuracies. Sensitivity and specificity were between 0.78 and 0.81 across the two approaches. Importantly, prediction models were based on a lagged data structure. Thus, in addition to supporting the prediction accuracy of our models with out-of-sample tests in time-forward data, the models themselves were time-lagged, such that each prediction was for the subsequent measurement. Such a system could be the basis for mobile, just-in-time interventions for substance use, as models that accurately predict future behavior could ostensibly be used for delivering personalized interventions at empirically-indicated moments of need.


Author(s):  
Takeru Aoki ◽  
◽  
Keiki Takadama ◽  
Hiroyuki Sato

The cortical learning algorithm (CLA) is a time-series data prediction method that is designed based on the human neocortex. The CLA has multiple columns that are associated with the input data bits by synapses. The input data is then converted into an internal column representation based on the synapse relation. Because the synapse relation between the columns and input data bits is fixed during the entire prediction process in the conventional CLA, it cannot adapt to input data biases. Consequently, columns not used for internal representations arise, resulting in a low prediction accuracy in the conventional CLA. To improve the prediction accuracy of the CLA, we propose a CLA that self-adaptively arranges the column synapses according to the input data tendencies and verify its effectiveness with several artificial time-series data and real-world electricity load prediction data from New York City. Experimental results show that the proposed CLA achieves higher prediction accuracy than the conventional CLA and LSTMs with different network optimization algorithms by arranging column synapses according to the input data tendency.


Author(s):  
Indranil Bose

Movement of stocks in the financial market is a typical example of financial time series data. It is generally believed that past performance of a stock can indicate its future trend and so stock trend analysis is a popular activity in the financial community. In this chapter, we will explore the unique characteristics of financial time series data mining. Financial time series analysis came into being recently. Though the world’s first stock exchange was established in the 18th century, stock trend analysis began only in the late 20th century. According to Tay et al. (2003) analysis of financial time series has been formally addressed only since 1980s. It is believed that financial time series data can speak for itself. By analyzing the data, one can understand the volatility, seasonal effects, liquidity, and price response and hence predict the movement of a stock. For example, the continuous downward movement of the S&P index during a short period of time allows investors to anticipate that majority of stocks will go down in immediate future. On the other hand, a sharp increase in interest rate makes investors speculate that a decrease in overall bond price will occur. Such conclusions can only be drawn after a detailed analysis of the historic stock data. There are many charts and figures related to stock index movements, change of exchange rates, and variations of bond prices, which can be encountered everyday. An example of such a financial time series data is shown in Figure 1. It is generally believed that through data analysis, analysts can exploit the temporal dependencies both in the deterministic (regression) and the stochastic (error) components of a model and can come up with better prediction models for future stock prices (Congdon, 2003).


Entropy ◽  
2020 ◽  
Vol 22 (12) ◽  
pp. 1414
Author(s):  
Krzysztof Gajowniczek ◽  
Marcin Bator ◽  
Tomasz Ząbkowski

Data from smart grids are challenging to analyze due to their very large size, high dimensionality, skewness, sparsity, and number of seasonal fluctuations, including daily and weekly effects. With the data arriving in a sequential form the underlying distribution is subject to changes over the time intervals. Time series data streams have their own specifics in terms of the data processing and data analysis because, usually, it is not possible to process the whole data in memory as the large data volumes are generated fast so the processing and the analysis should be done incrementally using sliding windows. Despite the proposal of many clustering techniques applicable for grouping the observations of a single data stream, only a few of them are focused on splitting the whole data streams into the clusters. In this article we aim to explore individual characteristics of electricity usage and recommend the most suitable tariff to the customer so they can benefit from lower prices. This work investigates various algorithms (and their improvements) what allows us to formulate the clusters, in real time, based on smart meter data.


Sign in / Sign up

Export Citation Format

Share Document