Simulation Study on Clustering Approaches for Short-Term Electricity Forecasting

Complexity ◽

10.1155/2018/3683969 ◽

2018 ◽

Vol 2018 ◽

pp. 1-21 ◽

Cited By ~ 13

Author(s):

Krzysztof Gajowniczek ◽

Tomasz Ząbkowski

Keyword(s):

Time Series ◽

Similarity Measures ◽

Optimal Number ◽

Clustering Methods ◽

Smart Meters ◽

Practical Applications ◽

Electricity Use ◽

Residential Electricity ◽

Using Data ◽

Advanced Metering

Advanced metering infrastructures such as smart metering have begun to attract increasing attention; a considerable body of research is currently focusing on load profiling and forecasting at different scales on the grid. Electricity time series clustering is an effective tool for identifying useful information in various practical applications, including the forecasting of electricity usage, which is important for providing more data to smart meters. This paper presents a comprehensive study of clustering methods for residential electricity demand profiles and further applications focused on the creation of more accurate electricity forecasts for residential customers. The contributions of this paper are threefold: (1) using data from 46 homes in Austin, Texas, the similarity measures from different time series are analyzed; (2) the optimal number of clusters for representing residential electricity use profiles is determined; and (3) an extensive load forecasting study using different segmentation-enhanced forecasting algorithms is undertaken. Finally, from the operator’s perspective, the implications of the results are discussed in terms of the use of clustering methods for grouping electrical load patterns.

Download Full-text

Comparison of similarity measures and clustering methods for time-series medical data mining

10.1117/12.487508 ◽

2003 ◽

Cited By ~ 1

Author(s):

Shoji Hirano ◽

Shisaku Tsumoto

Keyword(s):

Data Mining ◽

Time Series ◽

Similarity Measures ◽

Medical Data ◽

Medical Data Mining ◽

Clustering Methods

Download Full-text

Clustering-Based Behavioural Analysis of Biological Objects

Environment Technology Resources Proceedings of the International Scientific and Practical Conference ◽

10.17770/etr2011vol2.982 ◽

2015 ◽

Vol 2 ◽

pp. 24

Author(s):

Arnis Kirshners

Keyword(s):

Data Mining ◽

Time Series ◽

Clustering Algorithms ◽

Optimal Number ◽

Laboratory Animals ◽

Short Time Series ◽

Mining Methods ◽

Heart Contraction ◽

Using Data ◽

Short Time

The article examines the problem of processing short time series for bioinformatics tasks using data mining methods in the field of pharmacology. The experiments were conducted using heart contraction (contraction and relaxation) power data that were obtained in experiments with laboratory animals with the goal of registering the power changes of heart contractions in different stages of experiment in a given period of time. The selected data were treated using data preprocessing technologies. The short time series were compared using various time-point similarity search methods using agglomerative hierarchical clustering, k- means clustering, modified k-means clustering and expectation-maximization clustering algorithms. Based on the clustering result evaluation the most suitable algorithm was chosen and the optimal number of clusters was determined for the least clustering error. The acquired clusters were used for to create cluster prototypes that aggregate the groups of similar heart contraction power objects. The article offers an examination of the errors produced by algorithms and methods as well as a discussion of the obtained clustering results using different evaluation methodologies. It also gives conclusions about the application of data mining methods in solving bioinformatics tasks and outlines further research directions.

Download Full-text

Load Profile-Based Residential Customer Segmentation for Analyzing Customer Preferred Time-of-Use (TOU) Tariffs

Energies ◽

10.3390/en14196130 ◽

2021 ◽

Vol 14 (19) ◽

pp. 6130

Author(s):

Minseok Jang ◽

Hyun-Cheol Jeong ◽

Taegon Kim ◽

Sung-Kwan Joo

Keyword(s):

Dynamic Pricing ◽

Mixed Logit ◽

Electricity Consumption ◽

Information Criterion ◽

Gaussian Mixture ◽

Optimal Number ◽

Customer Segmentation ◽

Smart Meters ◽

Load Profile ◽

Residential Electricity

Smart meters and dynamic pricing are key factors in implementing a smart grid. Dynamic pricing is one of the demand-side management methods that can shift demand from on-peak to off-peak. Furthermore, dynamic pricing can help utilities reduce the investment cost of a power system by charging different prices at different times according to system load profile. On the other hand, a dynamic pricing strategy that can satisfy residential customers is required from the customer’s perspective. Residential load profiles can be used to comprehend residential customers’ preferences for electricity tariffs. In this study, in order to analyze the preference for time-of-use (TOU) rates of Korean residential customers through residential electricity consumption data, a representative load profile for each customer can be found by utilizing the hourly consumption of median. In the feature extraction stage, six features that can explain the customer’s daily usage patterns are extracted from the representative load profile. Korean residential load profiles are clustered into four groups using a Gaussian mixture model (GMM) with Bayesian information criterion (BIC), which helps find the optimal number of groups, in the clustering stage. Furthermore, a choice experiment (CE) is performed to identify Korean residential customers’ preferences for TOU with selected attributes. A mixed logit model with a Bayesian approach is used to estimate each group’s customer preference for attributes of a time-of-use (TOU) tariff. Finally, a TOU tariff for each group’s load profile is recommended using the estimated part-worth.

Download Full-text

EXTRACTING WEB USER PROFILES USING RELATIONAL COMPETITIVE FUZZY CLUSTERING

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300000032x ◽

2000 ◽

Vol 09 (04) ◽

pp. 509-526 ◽

Cited By ~ 80

Author(s):

OLFA NASRAOUI ◽

HICHEM FRIGUI ◽

RAGHU KRISHNAPURAM ◽

ANUPAM JOSHI

Keyword(s):

Clustering Algorithm ◽

Distance Measure ◽

Similarity Measures ◽

Unsupervised Classification ◽

Optimal Number ◽

Relational Data ◽

Data Card ◽

User Profiles ◽

Clustering Methods ◽

Access Logs

The proliferation of information on the World Wide Web has made the personalization of this information space a necessity. An important component of Web personalization is to mine typical user profiles from the vast amount of historical data stored in access logs. In the absence of any a priori knowledge, unsupervised classification or clustering methods seem to be ideally suited to analyze the semi-structured log data of user accesses. In this paper, we define the notion of a "user session" as being a temporally compact sequence of Web accesses by a user. We also define a new distance measure between two Web sessions that captures the organization of a Web site. The Competitive Agglomeration clustering algorithm which can automatically cluster data into the optimal number of components is extended so that it can work on relational data. The resulting Competitive Agglomeration for Relational Data (CARD) algorithm can deal with complex, non-Euclidean, distance/similarity measures. This algorithm was used to analyze Web server access logs successfully and obtain typical session profiles of users.

Download Full-text

Diagnostics of metal samples using the results of experimental and theoretical study of positively charged microparticles formed upon sample destruction

Industrial laboratory Diagnostics of materials ◽

10.26896/1028-6861-2018-84-10-23-28 ◽

2018 ◽

Vol 84 (10) ◽

pp. 23-28

Author(s):

D. A. Golentsov ◽

A. G. Gulin ◽

Vladimir A. Likhter ◽

K. E. Ulybyshev

Keyword(s):

Experimental Data ◽

Early Stage ◽

Experimental Studies ◽

Material Model ◽

Total Charge ◽

Cross Sectional ◽

Practical Applications ◽

Mechanical And Electrical Properties ◽

Order Of Magnitude ◽

Using Data

Destruction of bodies is accompanied by formation of both large and microscopic fragments. Numerous experiments on the rupture of different samples show that those fragments carry a positive electric charge. his phenomenon is of interest from the viewpoint of its potential application to contactless diagnostics of the early stage of destruction of the elements in various technical devices. However, the lack of understanding the nature of this phenomenon restricts the possibility of its practical applications. Experimental studies were carried out using an apparatus that allowed direct measurements of the total charge of the microparticles formed upon sample rupture and determination of their size and quantity. The results of rupture tests of duralumin and electrical steel showed that the size of microparticles is several tens of microns, the particle charge per particle is on the order of 10–14 C, and their amount can be estimated as the ratio of the cross-sectional area of the sample at the point of discontinuity to the square of the microparticle size. A model of charge formation on the microparticles is developed proceeding from the experimental data and current concept of the electron gas in metals. The model makes it possible to determine the charge of the microparticle using data on the particle size and mechanical and electrical properties of the material. Model estimates of the total charge of particles show order-of-magnitude agreement with the experimental data.

Download Full-text

Improving Runoff Simulations using Satellite-observed Time-series of Snow Covered Area

Hydrology Research ◽

10.2166/nh.2003.0008 ◽

2003 ◽

Vol 34 (4) ◽

pp. 281-294 ◽

Cited By ~ 11

Author(s):

R.V. Engeset ◽

H-C. Udnæs ◽

T. Guneriussen ◽

H. Koren ◽

E. Malnes ◽

...

Keyword(s):

Time Series ◽

Radar Sensors ◽

Noaa Avhrr ◽

Runoff Forecasting ◽

Snow Covered Area ◽

Covered Area ◽

Satellite Sensors ◽

Major Floods ◽

Using Data ◽

Southern Norway

Snowmelt can be a significant contributor to major floods, and hence updated snow information is very important to flood forecasting services. This study assesses whether operational runoff simulations could be improved by applying satellite-derived snow covered area (SCA) from both optical and radar sensors. Currently the HBV model is used for runoff forecasting in Norway, and satellite-observed SCA is used qualitatively but not directly in the model. Three catchments in southern Norway are studied using data from 1995 to 2002. The results show that satellite-observed SCA can be used to detect when the models do not simulate the snow reservoir correctly. Detecting errors early in the snowmelt season will help the forecasting services to update and correct the models before possible damaging floods. The method requires model calibration against SCA as well as runoff. Time-series from the satellite sensors NOAA AVHRR and ERS SAR are used. Of these, AVHRR shows good correlation with the simulated SCA, and SAR less so. Comparison of simultaneous data from AVHRR, SAR and Landsat ETM+ for May 2000 shows good inter-correlation. Of a total satellite-observed area of 1,088 km2, AVHRR observed a SCA of 823 km2 and SAR 720 km2, as compared to 889 km2 using ETM+.

Download Full-text

Price-Based Demand Side Response Programs and Their Effectiveness on the Example of TOU Electricity Tariff for Residential Consumers

Energies ◽

10.3390/en14020287 ◽

2021 ◽

Vol 14 (2) ◽

pp. 287

Author(s):

Jerzy Andruszkiewicz ◽

Józef Lorenc ◽

Agnieszka Weychan

Keyword(s):

Power Systems ◽

Distribution System ◽

Peak Load ◽

Demand Side ◽

Case Study Analysis ◽

Efficiency Assessment ◽

Electricity Use ◽

Residential Electricity ◽

Demand Side Response ◽

Residential Consumers

Demand side response is becoming an increasingly significant issue for reliable power systems’ operation. Therefore, it is desirable to ensure high effectiveness of such programs, including electricity tariffs. The purpose of the study is developing a method for analysing electricity tariff’s effectiveness in terms of demand side response purposes based on statistical data concerning tariffs’ use by the consumers and price elasticity of their electricity demand. A case-study analysis is presented for residential electricity consumers, shifting the settlement and consequently the profile of electricity use from a flat to a time-of-use tariff, based on the comparison of the considered tariff groups. Additionally, a correlation analysis is suggested to verify tariffs’ influence of the power system’s peak load based on residential electricity tariffs in Poland. The presented analysis proves that large residential consumers aggregated by tariff incentives may have a significant impact on the power system’s load and this impact changes substantially for particular hours of a day or season. Such efficiency assessment may be used by both energy suppliers to optimize their market purchases and by distribution system operators in order to ensure adequate generation during peak load periods.

Download Full-text

Performance Comparison of Deep Learning Autoencoders for Cancer Subtype Detection Using Multi-Omics Data

Cancers ◽

10.3390/cancers13092013 ◽

2021 ◽

Vol 13 (9) ◽

pp. 2013

Author(s):

Edian F. Franco ◽

Pratip Rana ◽

Aline Cruz ◽

Víctor V. Calderón ◽

Vasco Azevedo ◽

...

Keyword(s):

Deep Learning ◽

Data Fusion ◽

Similarity Measures ◽

Research Problem ◽

Optimal Number ◽

Performance Comparison ◽

The Cancer Genome Atlas ◽

Cancer Type ◽

Omics Data ◽

Cancer Subtype

A heterogeneous disease such as cancer is activated through multiple pathways and different perturbations. Depending upon the activated pathway(s), the survival of the patients varies significantly and shows different efficacy to various drugs. Therefore, cancer subtype detection using genomics level data is a significant research problem. Subtype detection is often a complex problem, and in most cases, needs multi-omics data fusion to achieve accurate subtyping. Different data fusion and subtyping approaches have been proposed over the years, such as kernel-based fusion, matrix factorization, and deep learning autoencoders. In this paper, we compared the performance of different deep learning autoencoders for cancer subtype detection. We performed cancer subtype detection on four different cancer types from The Cancer Genome Atlas (TCGA) datasets using four autoencoder implementations. We also predicted the optimal number of subtypes in a cancer type using the silhouette score and found that the detected subtypes exhibit significant differences in survival profiles. Furthermore, we compared the effect of feature selection and similarity measures for subtype detection. For further evaluation, we used the Glioblastoma multiforme (GBM) dataset and identified the differentially expressed genes in each of the subtypes. The results obtained are consistent with other genomic studies and can be corroborated with the involved pathways and biological functions. Thus, it shows that the results from the autoencoders, obtained through the interaction of different datatypes of cancer, can be used for the prediction and characterization of patient subgroups and survival profiles.

Download Full-text

Mapping the Land Development Processes Using Data Transformation and Clustering Methods

IGARSS 2020 - 2020 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss39084.2020.9323510 ◽

2020 ◽

Author(s):

Pariya Pourmohammadi ◽

Donald A. Adjeroh ◽

Michael P. Strager

Keyword(s):

Land Development ◽

Data Transformation ◽

Clustering Methods ◽

Development Processes ◽

Using Data

Download Full-text

SEMIPARAMETRIC CLUSTERING METHOD FOR MICROARRAY DATA ANALYSIS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000800345x ◽

2008 ◽

Vol 06 (02) ◽

pp. 261-282 ◽

Cited By ~ 2

Author(s):

AO YUAN ◽

WENQING HE

Keyword(s):

Data Analysis ◽

Microarray Data ◽

Mixture Distribution ◽

Information Criterion ◽

Optimal Number ◽

Microarray Data Analysis ◽

Parametric Methods ◽

Clustering Methods ◽

Microarray Gene Expression ◽

Data Set

Clustering is a major tool for microarray gene expression data analysis. The existing clustering methods fall mainly into two categories: parametric and nonparametric. The parametric methods generally assume a mixture of parametric subdistributions. When the mixture distribution approximately fits the true data generating mechanism, the parametric methods perform well, but not so when there is nonnegligible deviation between them. On the other hand, the nonparametric methods, which usually do not make distributional assumptions, are robust but pay the price for efficiency loss. In an attempt to utilize the known mixture form to increase efficiency, and to free assumptions about the unknown subdistributions to enhance robustness, we propose a semiparametric method for clustering. The proposed approach possesses the form of parametric mixture, with no assumptions to the subdistributions. The subdistributions are estimated nonparametrically, with constraints just being imposed on the modes. An expectation-maximization (EM) algorithm along with a classification step is invoked to cluster the data, and a modified Bayesian information criterion (BIC) is employed to guide the determination of the optimal number of clusters. Simulation studies are conducted to assess the performance and the robustness of the proposed method. The results show that the proposed method yields reasonable partition of the data. As an illustration, the proposed method is applied to a real microarray data set to cluster genes.

Download Full-text