Optimal Energy Operation Strategy for We-Energy of Energy Internet Based on Hybrid Reinforcement Learning With Human-in-the-Loop

Author(s):  
Lingxiao Yang ◽  
Qiuye Sun ◽  
Ning Zhang ◽  
Zhenwei Liu
2019 ◽  
Vol 9 (3) ◽  
pp. 520 ◽  
Author(s):  
Dan-Lu Wang ◽  
Qiu-Ye Sun ◽  
Yu-Yang Li ◽  
Xin-Rui Liu

In order to cope with the energy crisis, the concept of an energy internet (EI) has been proposed as a novel energy structure with high efficiency which allows full play to the advantages of multi-energy coupling. In order to adapt to the multi-energy coupled energy structure and achieve flexible conversion and interaction of multi-energy, the concept of energy routing centers (ERCs) is proposed. A two-layered structure of an ERC is established. Multi-energy conversion devices and connection ports with monitoring functions are integrated in the physical layer which allows multi-energy flow with high flexibility. As for the EI with several ERCs connected to each other, energy flows among them are managed by an energy routing controller located in the information layer. In order to improve the efficiency and reduce the operating cost and environmental cost of the proposed EI, an optimal multi-energy management-based energy routing design problem is researched. Specifically, the voltages of the ERC ports are managed to regulate the power flow on the connection lines and are restricted on account of security operations. An artificial neural network (ANN)-based reinforcement learning algorithm was proposed to manage the optimal energy routing path. Simulations were done to verify the effectiveness of the proposed method.


Energies ◽  
2021 ◽  
Vol 14 (9) ◽  
pp. 2700
Author(s):  
Grace Muriithi ◽  
Sunetra Chowdhury

In the near future, microgrids will become more prevalent as they play a critical role in integrating distributed renewable energy resources into the main grid. Nevertheless, renewable energy sources, such as solar and wind energy can be extremely volatile as they are weather dependent. These resources coupled with demand can lead to random variations on both the generation and load sides, thus complicating optimal energy management. In this article, a reinforcement learning approach has been proposed to deal with this non-stationary scenario, in which the energy management system (EMS) is modelled as a Markov decision process (MDP). A novel modification of the control problem has been presented that improves the use of energy stored in the battery such that the dynamic demand is not subjected to future high grid tariffs. A comprehensive reward function has also been developed which decreases infeasible action explorations thus improving the performance of the data-driven technique. A Q-learning algorithm is then proposed to minimize the operational cost of the microgrid under unknown future information. To assess the performance of the proposed EMS, a comparison study between a trading EMS model and a non-trading case is performed using a typical commercial load curve and PV profile over a 24-h horizon. Numerical simulation results indicate that the agent learns to select an optimized energy schedule that minimizes energy cost (cost of power purchased from the utility and battery wear cost) in all the studied cases. However, comparing the non-trading EMS to the trading EMS model operational costs, the latter one was found to decrease costs by 4.033% in summer season and 2.199% in winter season.


Author(s):  
Maximilian Moll ◽  
Leonhard Kunczik

AbstractIn recent history, reinforcement learning (RL) proved its capability by solving complex decision problems by mastering several games. Increased computational power and the advances in approximation with neural networks (NN) paved the path to RL’s successful applications. Even though RL can tackle more complex problems nowadays, it still relies on computational power and runtime. Quantum computing promises to solve these issues by its capability to encode information and the potential quadratic speedup in runtime. We compare tabular Q-learning and Q-learning using either a quantum or a classical approximation architecture on the frozen lake problem. Furthermore, the three algorithms are analyzed in terms of iterations until convergence to the optimal behavior, memory usage, and runtime. Within the paper, NNs are utilized for approximation in the classical domain, while in the quantum domain variational quantum circuits, as a quantum hybrid approximation method, have been used. Our simulations show that a quantum approximator is beneficial in terms of memory usage and provides a better sample complexity than NNs; however, it still lacks the computational speed to be competitive.


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4468
Author(s):  
Ao Xi ◽  
Chao Chen

In this work, we introduced a novel hybrid reinforcement learning scheme to balance a biped robot (NAO) on an oscillating platform, where the rotation of the platform is considered as the external disturbance to the robot. The platform had two degrees of freedom in rotation, pitch and roll. The state space comprised the position of center of pressure, and joint angles and joint velocities of two legs. The action space consisted of the joint angles of ankles, knees, and hips. By adding the inverse kinematics techniques, the dimension of action space was significantly reduced. Then, a model-based system estimator was employed during the offline training procedure to estimate the dynamics model of the system by using novel hierarchical Gaussian processes, and to provide initial control inputs, after which the reduced action space of each joint was obtained by minimizing the cost of reaching the desired stable state. Finally, a model-free optimizer based on DQN (λ) was introduced to fine tune the initial control inputs, where the optimal control inputs were obtained for each joint at any state. The proposed reinforcement learning not only successfully avoided the distribution mismatch problem, but also improved the sample efficiency. Simulation results showed that the proposed hybrid reinforcement learning mechanism enabled the NAO robot to balance on an oscillating platform with different frequencies and magnitudes. Both control performance and robustness were guaranteed during the experiments.


2021 ◽  
Vol 5 (2) ◽  
pp. 505-510
Author(s):  
Jaehyun Yoo ◽  
Dohyun Jang ◽  
H. Jin Kim ◽  
Karl H. Johansson

Sign in / Sign up

Export Citation Format

Share Document