Control of Shared Energy Storage Assets Within Building Clusters Using Reinforcement Learning

Author(s):  
Philip Odonkor ◽  
Kemper Lewis

This work leverages the current state of the art in reinforcement learning for continuous control, the Deep Deterministic Policy Gradient (DDPG) algorithm, towards the optimal 24-hour dispatch of shared energy assets within building clusters. The modeled DDPG agent interacts with a battery environment, designed to emulate a shared battery system. The aim here is to not only learn an efficient charged/discharged policy, but to also address the continuous domain question of how much energy should be charged or discharged. Experimentally, we examine the impact of the learned dispatch strategy towards minimizing demand peaks within the building cluster. Our results show that across the variety of building cluster combinations studied, the algorithm is able to learn and exploit energy arbitrage, tailoring it into battery dispatch strategies for peak demand shifting.

2018 ◽  
Vol 141 (2) ◽  
Author(s):  
Philip Odonkor ◽  
Kemper Lewis

The control of shared energy assets within building clusters has traditionally been confined to a discrete action space, owing in part to a computationally intractable decision space. In this work, we leverage the current state of the art in reinforcement learning (RL) for continuous control tasks, the deep deterministic policy gradient (DDPG) algorithm, toward addressing this limitation. The goals of this paper are twofold: (i) to design an efficient charged/discharged dispatch policy for a shared battery system within a building cluster and (ii) to address the continuous domain task of determining how much energy should be charged/discharged at each decision cycle. Experimentally, our results demonstrate an ability to exploit factors such as energy arbitrage, along with the continuous action space toward demand peak minimization. This approach is shown to be computationally tractable, achieving efficient results after only 5 h of simulation. Additionally, the agent showed an ability to adapt to different building clusters, designing unique control strategies to address the energy demands of the clusters studied.


2020 ◽  
Vol 34 (04) ◽  
pp. 3316-3323
Author(s):  
Qingpeng Cai ◽  
Ling Pan ◽  
Pingzhong Tang

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.


Author(s):  
Florian Kuisat ◽  
Fernando Lasagni ◽  
Andrés Fabián Lasagni

AbstractIt is well known that the surface topography of a part can affect its mechanical performance, which is typical in additive manufacturing. In this context, we report about the surface modification of additive manufactured components made of Titanium 64 (Ti64) and Scalmalloy®, using a pulsed laser, with the aim of reducing their surface roughness. In our experiments, a nanosecond-pulsed infrared laser source with variable pulse durations between 8 and 200 ns was applied. The impact of varying a large number of parameters on the surface quality of the smoothed areas was investigated. The results demonstrated a reduction of surface roughness Sa by more than 80% for Titanium 64 and by 65% for Scalmalloy® samples. This allows to extend the applicability of additive manufactured components beyond the current state of the art and break new ground for the application in various industrial applications such as in aerospace.


2020 ◽  
Author(s):  
Ali Fallah ◽  
Sungmin O ◽  
Rene Orth

Abstract. Precipitation is a crucial variable for hydro-meteorological applications. Unfortunately, rain gauge measurements are sparse and unevenly distributed, which substantially hampers the use of in-situ precipitation data in many regions of the world. The increasing availability of high-resolution gridded precipitation products presents a valuable alternative, especially over gauge-sparse regions. Nevertheless, uncertainties and corresponding differences across products can limit the applicability of these data. This study examines the usefulness of current state-of-the-art precipitation datasets in hydrological modelling. For this purpose, we force a conceptual hydrological model with multiple precipitation datasets in > 200 European catchments. We consider a wide range of precipitation products, which are generated via (1) interpolation of gauge measurements (E-OBS and GPCC V.2018), (2) combination of multiple sources (MSWEP V2) and (3) data assimilation into reanalysis models (ERA-Interim, ERA5, and CFSR). For each catchment, runoff and evapotranspiration simulations are obtained by forcing the model with the various precipitation products. Evaluation is done at the monthly time scale during the period of 1984–2007. We find that simulated runoff values are highly dependent on the accuracy of precipitation inputs, and thus show significant differences between the simulations. By contrast, simulated evapotranspiration is generally much less influenced. The results are further analysed with respect to different hydro-climatic regimes. We find that the impact of precipitation uncertainty on simulated runoff increases towards wetter regions, while the opposite is observed in the case of evapotranspiration. Finally, we perform an indirect performance evaluation of the precipitation datasets by comparing the runoff simulations with streamflow observations. Thereby, E-OBS yields the best agreement, while furthermore ERA5, GPCC V.2018 and MSWEP V2 show good performance. In summary, our findings highlight a climate-dependent propagation of precipitation uncertainty through the water cycle; while runoff is strongly impacted in comparatively wet regions such as Central Europe, there are increasing implications on evapotranspiration towards drier regions.


2020 ◽  
Vol 34 (10) ◽  
pp. 13905-13906
Author(s):  
Rohan Saphal ◽  
Balaraman Ravindran ◽  
Dheevatsa Mudigere ◽  
Sasikanth Avancha ◽  
Bharat Kaul

Reinforcement learning algorithms are sensitive to hyper-parameters and require tuning and tweaking for specific environments for improving performance. Ensembles of reinforcement learning models on the other hand are known to be much more robust and stable. However, training multiple models independently on an environment suffers from high sample complexity. We present here a methodology to create multiple models from a single training instance that can be used in an ensemble through directed perturbation of the model parameters at regular intervals. This allows training a single model that converges to several local minima during the optimization process as a result of the perturbation. By saving the model parameters at each such instance, we obtain multiple policies during training that are ensembled during evaluation. We evaluate our approach on challenging discrete and continuous control tasks and also discuss various ensembling strategies. Our framework is substantially sample efficient, computationally inexpensive and is seen to outperform state of the art (SOTA) approaches


2006 ◽  
Vol 3 (5) ◽  
pp. 317 ◽  
Author(s):  
Ole Hertel ◽  
Carsten Ambelas Skjøth ◽  
Per Løfstrøm ◽  
Camilla Geels ◽  
Lise Marie Frohn ◽  
...  

Abstract. Local ammonia emissions from agricultural activities are often associated with high nitrogen deposition in the close vicinity of the sources. High nitrogen (N) inputs may significantly affect the local ecosystems. Over a longer term, high loads may change the composition of the ecosystems, leading to a general decrease in local biodiversity. In Europe there is currently a significant focus on the impact of atmospheric N load on local ecosystems among environmental managers and policy makers. Model tools designed for application in N deposition assessment and aimed for use in the regulation of anthropogenic nitrogen emissions are, therefore, under development in many European countries. The aim of this paper is to present a review of the current understanding and modelling parameterizations of atmospheric N deposition. A special focus is on the development of operational tools for use in environmental assessment and regulation related to agricultural ammonia emissions. For the often large number of environmental impact assessments needed to be carried out by local environmental managers there is, furthermore, a need for simple and fast model systems. These systems must capture the most important aspects of dispersion and deposition of N in the nearby environment of farms with animal production. The paper includes a discussion on the demands on the models applied in environmental assessment and regulation and how these demands are fulfilled in current state-of-the-art models.


Electronics ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 37
Author(s):  
Hai-Tao Yu ◽  
Degen Huang ◽  
Fuji Ren ◽  
Lishuang Li

Learning-to-rank has been intensively studied and has shown significantly increasing values in a wide range of domains, such as web search, recommender systems, dialogue systems, machine translation, and even computational biology, to name a few. In light of recent advances in neural networks, there has been a strong and continuing interest in exploring how to deploy popular techniques, such as reinforcement learning and adversarial learning, to solve ranking problems. However, armed with the aforesaid popular techniques, most studies tend to show how effective a new method is. A comprehensive comparison between techniques and an in-depth analysis of their deficiencies are somehow overlooked. This paper is motivated by the observation that recent ranking methods based on either reinforcement learning or adversarial learning boil down to policy-gradient-based optimization. Based on the widely used benchmark collections with complete information (where relevance labels are known for all items), such as MSLRWEB30K and Yahoo-Set1, we thoroughly investigate the extent to which policy-gradient-based ranking methods are effective. On one hand, we analytically identify the pitfalls of policy-gradient-based ranking. On the other hand, we experimentally compare a wide range of representative methods. The experimental results echo our analysis and show that policy-gradient-based ranking methods are, by a large margin, inferior to many conventional ranking methods. Regardless of whether we use reinforcement learning or adversarial learning, the failures are largely attributable to the gradient estimation based on sampled rankings, which significantly diverge from ideal rankings. In particular, the larger the number of documents per query and the more fine-grained the ground-truth labels, the greater the impact policy-gradient-based ranking suffers. Careful examination of this weakness is highly recommended for developing enhanced methods based on policy gradient.


Author(s):  
Yu Wang ◽  
Hongxia Jin

In this paper, we present a multi-step coarse to fine question answering (MSCQA) system which can efficiently processes documents with different lengths by choosing appropriate actions. The system is designed using an actor-critic based deep reinforcement learning model to achieve multistep question answering. Compared to previous QA models targeting on datasets mainly containing either short or long documents, our multi-step coarse to fine model takes the merits from multiple system modules, which can handle both short and long documents. The system hence obtains a much better accuracy and faster trainings speed compared to the current state-of-the-art models. We test our model on four QA datasets, WIKEREADING, WIKIREADING LONG, CNN and SQuAD, and demonstrate 1.3%-1.7% accuracy improvements with 1.5x-3.4x training speed-ups in comparison to the baselines using state-of-the-art models.


Symmetry ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 1352 ◽  
Author(s):  
Kim ◽  
Park

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.


Author(s):  
Sergey Mikhalovsky ◽  
Oleksandr Voytko ◽  
Violetta Demchenko ◽  
Pavlo Demchenko

Enterosorption is a cost-effective and efficient approach to reducing the impact of chronic exposure to heavy metals and radionuclides. As an auxiliary method to medical treatment, it can protect population chronically exposed to the intake of heavy metals or radioactivity due to industrial activities or in the aftermath of technogenic or natural accidents. This paper assesses the current state of the art in the treatment of acute and chronic heavy metal poisoning.


Sign in / Sign up

Export Citation Format

Share Document