DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

MULTIPOLAR: Multi-Source Policy Aggregation for Transfer Reinforcement Learning between Diverse Environmental Dynamics

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/430 ◽

2020 ◽

Author(s):

Mohammadamin Barekatain ◽

Ryo Yonetani ◽

Masashi Hamaya

Keyword(s):

Reinforcement Learning ◽

Task Performance ◽

Experimental Evaluation ◽

Control Problems ◽

Target Task ◽

Learning Efficiency ◽

Simulated Environments ◽

Discrete Action ◽

Key Techniques ◽

Action Spaces

Transfer reinforcement learning (RL) aims at improving the learning efficiency of an agent by exploiting knowledge from other source agents trained on relevant tasks. However, it remains challenging to transfer knowledge between different environmental dynamics without having access to the source environments. In this work, we explore a new challenge in transfer RL, where only a set of source policies collected under diverse unknown dynamics is available for learning a target task efficiently. To address this problem, the proposed approach, MULTI-source POLicy AggRegation (MULTIPOLAR), comprises two key techniques. We learn to aggregate the actions provided by the source policies adaptively to maximize the target task performance. Meanwhile, we learn an auxiliary network that predicts residuals around the aggregated actions, which ensures the target policy's expressiveness even when some of the source policies perform poorly. We demonstrated the effectiveness of MULTIPOLAR through an extensive experimental evaluation across six simulated environments ranging from classic control problems to challenging robotics simulations, under both continuous and discrete action spaces. The demo videos and code are available on the project webpage: https://omron-sinicx.github.io/multipolar/.

Download Full-text

Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces

Journal of Marine Science and Technology ◽

10.1007/s00773-020-00755-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ryohei Sawada ◽

Keiji Sato ◽

Takahiro Majima

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Continuous Action ◽

Ship Collision ◽

Action Spaces

Download Full-text

MetaLight: Value-Based Meta-Reinforcement Learning for Traffic Signal Control

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i01.5467 ◽

2020 ◽

Vol 34 (01) ◽

pp. 1153-1160 ◽

Cited By ~ 1

Author(s):

Xinshi Zang ◽

Huaxiu Yao ◽

Guanjie Zheng ◽

Nan Xu ◽

Kai Xu ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Learning Algorithm ◽

Traffic Signal ◽

Training Data ◽

Signal Control ◽

Traffic Signal Control ◽

Individual Level ◽

Real World Datasets ◽

Reinforcement Learning Models

Using reinforcement learning for traffic signal control has attracted increasing interests recently. Various value-based reinforcement learning methods have been proposed to deal with this classical transportation problem and achieved better performances compared with traditional transportation methods. However, current reinforcement learning models rely on tremendous training data and computational resources, which may have bad consequences (e.g., traffic jams or accidents) in the real world. In traffic signal control, some algorithms have been proposed to empower quick learning from scratch, but little attention is paid to learning by transferring and reusing learned experience. In this paper, we propose a novel framework, named as MetaLight, to speed up the learning process in new scenarios by leveraging the knowledge learned from existing scenarios. MetaLight is a value-based meta-reinforcement learning workflow based on the representative gradient-based meta-learning algorithm (MAML), which includes periodically alternate individual-level adaptation and global-level adaptation. Moreover, MetaLight improves the-state-of-the-art reinforcement learning model FRAP in traffic signal control by optimizing its model structure and updating paradigm. The experiments on four real-world datasets show that our proposed MetaLight not only adapts more quickly and stably in new traffic scenarios, but also achieves better performance.

Download Full-text

A phased reinforcement learning algorithm for complex control problems

Artificial Life and Robotics ◽

10.1007/s10015-007-0427-y ◽

2007 ◽

Vol 11 (2) ◽

pp. 190-196 ◽

Cited By ~ 5

Author(s):

Takakuni Goto ◽

Noriyasu Homma ◽

Makoto Yoshizawa ◽

Kenichi Abe

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Control Problems ◽

Complex Control ◽

Reinforcement Learning Algorithm

Download Full-text

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Haotian Fu ◽

Hongyao Tang ◽

Jianye Hao ◽

Zihan Lei ◽

Yingfeng Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Q Learning ◽

Challenging Tasks ◽

Discrete Action ◽

Multi Agent ◽

Decentralized Execution ◽

Novel Algorithms ◽

Action Spaces ◽

Different Levels

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text

A reinforcement learning algorithm developed to model GenCo strategic bidding behavior in multidimensional and continuous state and action spaces

2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2013.6614997 ◽

2013 ◽

Cited By ~ 1

Author(s):

Alfred Yong Fu Lau ◽

Dipti Srinivasan ◽

Thomas Reindl

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Bidding Behavior ◽

Strategic Bidding ◽

Continuous State ◽

Action Spaces ◽

Reinforcement Learning Algorithm

Download Full-text

TD based reinforcement learning using neural networks in control problems with continuous action space

1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227) ◽

10.1109/ijcnn.1998.687171 ◽

2002 ◽

Cited By ~ 1

Author(s):

Jeong-Hoon Lee ◽

Se-Young Oh ◽

Doo-Hyun Choi

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Action Space ◽

Control Problems ◽

Continuous Action

Download Full-text

Reinforcement learning in multidimensional continuous action spaces

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2011.5967381 ◽

2011 ◽

Cited By ~ 10

Author(s):

Jason Pazis ◽

Michail G. Lagoudakis

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Action Spaces

Download Full-text

Pre-Training Acquisition Functions by Deep Reinforcement Learning for Fixed Budget Active Learning

Neural Processing Letters ◽

10.1007/s11063-021-10476-z ◽

2021 ◽

Author(s):

Yusuke Taguchi ◽

Hideitsu Hino ◽

Keisuke Kameyama

Keyword(s):

Neural Networks ◽

Reinforcement Learning ◽

Active Learning ◽

Supervised Learning ◽

Deep Neural Networks ◽

Learning Algorithm ◽

Learning Problem ◽

Q Learning ◽

Fixed Budget ◽

Active Learner

AbstractThere are many situations in supervised learning where the acquisition of data is very expensive and sometimes determined by a user’s budget. One way to address this limitation is active learning. In this study, we focus on a fixed budget regime and propose a novel active learning algorithm for the pool-based active learning problem. The proposed method performs active learning with a pre-trained acquisition function so that the maximum performance can be achieved when the number of data that can be acquired is fixed. To implement this active learning algorithm, the proposed method uses reinforcement learning based on deep neural networks as as a pre-trained acquisition function tailored for the fixed budget situation. By using the pre-trained deep Q-learning-based acquisition function, we can realize the active learner which selects a sample for annotation from the pool of unlabeled samples taking the fixed-budget situation into account. The proposed method is experimentally shown to be comparable with or superior to existing active learning methods, suggesting the effectiveness of the proposed approach for the fixed-budget active learning.

Download Full-text

Shifting Deep Reinforcement Learning Algorithm towards Training Directly in Transient Real-World Environment: A Case Study in Powertrain Control

IEEE Transactions on Industrial Informatics ◽

10.1109/tii.2021.3063489 ◽

2021 ◽

pp. 1-1

Author(s):

Bo Hu ◽

Jiaxi Li

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Learning Algorithm ◽

Powertrain Control ◽

World Environment ◽

Reinforcement Learning Algorithm

Download Full-text