The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement Learning

Menghao Wu; Yanbin Gao; Pengfei Wang; Fan Zhang; Zhejun Liu

doi:10.3390/sym13081335

The Multi-Dimensional Actions Control Approach for Obstacle Avoidance Based on Reinforcement Learning

Symmetry ◽

10.3390/sym13081335 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1335

Author(s):

Menghao Wu ◽

Yanbin Gao ◽

Pengfei Wang ◽

Fan Zhang ◽

Zhejun Liu

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Control Policy ◽

Continuous Action ◽

Control Approach ◽

Low Level ◽

Learning Technique ◽

Distance Sensor ◽

High Level ◽

Action Spaces

In robotics, obstacle avoidance is an essential ability for distance sensor-based robots. This type of robot has axisymmetrically distributed distance sensors to acquire obstacle distance, so the state is symmetrical. Training the control policy with a reinforcement learning method is a trend. Considering the complexity of environments, such as narrow paths and right-angle turns, robots will have a better ability if the control policy can control the steering direction and speed simultaneously. This paper proposes the multi-dimensional action control (MDAC) approach based on a reinforcement learning technique, which can be used in multiple continuous action space tasks. It adopts a hierarchical structure, which has high and low-level modules. Low-level policies output concrete actions and the high-level policy determines when to invoke low-level modules according to the environment’s features. We design robot navigation experiments with continuous action spaces to test the method’s performance. It is an end-to-end approach and can solve complex obstacle avoidance tasks in navigation.

Download Full-text

DDPG Agent to Swing Up and Balance Cart- Pole System

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-943 ◽

2021 ◽

pp. 102-116

Author(s):

Buvanesh Pandian V

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Real World ◽

Learning Algorithm ◽

Current Approach ◽

Control Problems ◽

Mathematical Framework ◽

Test Environment ◽

Continuous Action ◽

Action Spaces

Reinforcement learning is a mathematical framework for agents to interact intelligently with their environment. Unlike supervised learning, where a system learns with the help of labeled data, reinforcement learning agents learn how to act by trial and error only receiving a reward signal from their environments. A field where reinforcement learning has been prominently successful is robotics [3]. However, real-world control problems are also particularly challenging because of the noise and high- dimensionality of input data (e.g., visual input). In recent years, in the field of supervised learning, deep neural networks have been successfully used to extract meaning from this kind of data. Building on these advances, deep reinforcement learning was used to solve complex problems like Atari games and Go. Mnih et al. [1] built a system with fixed hyper parameters able to learn to play 49 different Atari games only from raw pixel inputs. However, in order to apply the same methods to real-world control problems, deep reinforcement learning has to be able to deal with continuous action spaces. Discretizing continuous action spaces would scale poorly, since the number of discrete actions grows exponentially with the dimensionality of the action. Furthermore, having a parametrized policy can be advantageous because it can generalize in the action space. Therefore with this thesis we study state-of-the-art deep reinforcement learning algorithm, Deep Deterministic Policy Gradients. We provide a theoretical comparison to other popular methods, an evaluation of its performance, identify its limitations and investigate future directions of research. The remainder of the thesis is organized as follows. We start by introducing the field of interest, machine learning, focusing our attention of deep learning and reinforcement learning. We continue by describing in details the two main algorithms, core of this study, namely Deep Q-Network (DQN) and Deep Deterministic Policy Gradients (DDPG). We then provide implementatory details of DDPG and our test environment, followed by a description of benchmark test cases. Finally, we discuss the results of our evaluation, identifying limitations of the current approach and proposing future avenues of research.

Download Full-text

Automatic ship collision avoidance using deep reinforcement learning with LSTM in continuous action spaces

Journal of Marine Science and Technology ◽

10.1007/s00773-020-00755-0 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ryohei Sawada ◽

Keiji Sato ◽

Takahiro Majima

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Continuous Action ◽

Ship Collision ◽

Action Spaces

Download Full-text

Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/323 ◽

2019 ◽

Cited By ~ 5

Author(s):

Haotian Fu ◽

Hongyao Tang ◽

Jianye Hao ◽

Zihan Lei ◽

Yingfeng Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Q Learning ◽

Challenging Tasks ◽

Discrete Action ◽

Multi Agent ◽

Decentralized Execution ◽

Novel Algorithms ◽

Action Spaces ◽

Different Levels

Deep Reinforcement Learning (DRL) has been applied to address a variety of cooperative multi-agent problems with either discrete action spaces or continuous action spaces. However, to the best of our knowledge, no previous work has ever succeeded in applying DRL to multi-agent problems with discrete-continuous hybrid (or parameterized) action spaces which is very common in practice. Our work fills this gap by proposing two novel algorithms: Deep Multi-Agent Parameterized Q-Networks (Deep MAPQN) and Deep Multi-Agent Hierarchical Hybrid Q-Networks (Deep MAHHQN). We follow the centralized training but decentralized execution paradigm: different levels of communication between different agents are used to facilitate the training process, while each agent executes its policy independently based on local observations during execution. Our empirical results on several challenging tasks (simulated RoboCup Soccer and game Ghost Story) show that both Deep MAPQN and Deep MAHHQN are effective and significantly outperform existing independent deep parameterized Q-learning method.

Download Full-text

Robust ASV Navigation Through Ground to Water Cross-Domain Deep Reinforcement Learning

Frontiers in Robotics and AI ◽

10.3389/frobt.2021.739023 ◽

2021 ◽

Vol 8 ◽

Author(s):

Reeve Lambert ◽

Jianwen Li ◽

Li-Fan Wu ◽

Nina Mahmoudian

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Control Level ◽

Training Data ◽

Level Control ◽

Autonomous Surface Vehicle ◽

High Control ◽

Marine Applications ◽

High Level ◽

The Cost

This paper presents a framework to alleviate the Deep Reinforcement Learning (DRL) training data sparsity problem that is present in challenging domains by creating a DRL agent training and vehicle integration methodology. The methodology leverages accessible domains to train an agent to solve navigational problems such as obstacle avoidance and allows the agent to generalize to challenging and inaccessible domains such as those present in marine environments with minimal further training. This is done by integrating a DRL agent at a high level of vehicle control and leveraging existing path planning and proven low-level control methodologies that are utilized in multiple domains. An autonomy package with a tertiary multilevel controller is developed to enable the DRL agent to interface at the prescribed high control level and thus be separated from vehicle dynamics and environmental constraints. An example Deep Q Network (DQN) employing this methodology for obstacle avoidance is trained in a simulated ground environment, and then its ability to generalize across domains is experimentally validated. Experimental validation utilized a simulated water surface environment and real-world deployment of ground and water robotic platforms. This methodology, when used, shows that it is possible to leverage accessible and data rich domains, such as ground, to effectively develop marine DRL agents for use on Autonomous Surface Vehicle (ASV) navigation. This will allow rapid and iterative agent development without the risk of ASV loss, the cost and logistic overhead of marine deployment, and allow landlocked institutions to develop agents for marine applications.

Download Full-text

Hierarchically Structured Reinforcement Learning for Topically Coherent Visual Story Generation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018465 ◽

2019 ◽

Vol 33 ◽

pp. 8465-8472 ◽

Cited By ~ 8

Author(s):

Qiuyuan Huang ◽

Zhe Gan ◽

Asli Celikyilmaz ◽

Dapeng Wu ◽

Jianfeng Wang ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Approach ◽

Semantic Concept ◽

Sentence Generation ◽

Visual Storytelling ◽

Empirical Results ◽

Low Level ◽

Story Generation ◽

End To End ◽

High Level

We propose a hierarchically structured reinforcement learning approach to address the challenges of planning for generating coherent multi-sentence stories for the visual storytelling task. Within our framework, the task of generating a story given a sequence of images is divided across a two-level hierarchical decoder. The high-level decoder constructs a plan by generating a semantic concept (i.e., topic) for each image in sequence. The low-level decoder generates a sentence for each image using a semantic compositional network, which effectively grounds the sentence generation conditioned on the topic. The two decoders are jointly trained end-to-end using reinforcement learning. We evaluate our model on the visual storytelling (VIST) dataset. Empirical results from both automatic and human evaluations demonstrate that the proposed hierarchically structured reinforced training achieves significantly better performance compared to a strong flat deep reinforcement learning baseline.

Download Full-text

Restraining Bolts for Reinforcement Learning Agents

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i09.7114 ◽

2020 ◽

Vol 34 (09) ◽

pp. 13659-13662

Author(s):

Giuseppe De Giacomo ◽

Luca Iocchi ◽

Marco Favorito ◽

Fabio Patrizi

Keyword(s):

Reinforcement Learning ◽

Science Fiction ◽

Linear Time ◽

Learning Agents ◽

Low Level ◽

Learning Agent ◽

Time Logic ◽

The World ◽

High Level

In this work we have investigated the concept of “restraining bolt”, inspired by Science Fiction. We have two distinct sets of features extracted from the world, one by the agent and one by the authority imposing some restraining specifications on the behaviour of the agent (the “restraining bolt”). The two sets of features and, hence the model of the world attainable from them, are apparently unrelated since of interest to independent parties. However they both account for (aspects of) the same world. We have considered the case in which the agent is a reinforcement learning agent on a set of low-level (subsymbolic) features, while the restraining bolt is specified logically using linear time logic on finite traces f/f over a set of high-level symbolic features. We show formally, and illustrate with examples, that, under general circumstances, the agent can learn while shaping its goals to suitably conform (as much as possible) to the restraining bolt specifications.1

Download Full-text

Reinforcement learning in multidimensional continuous action spaces

2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL) ◽

10.1109/adprl.2011.5967381 ◽

2011 ◽

Cited By ~ 10

Author(s):

Jason Pazis ◽

Michail G. Lagoudakis

Keyword(s):

Reinforcement Learning ◽

Continuous Action ◽

Action Spaces

Download Full-text

Goal-Oriented Obstacle Avoidance with Deep Reinforcement Learning in Continuous Action Space

Electronics ◽

10.3390/electronics9030411 ◽

2020 ◽

Vol 9 (3) ◽

pp. 411

Author(s):

Reinis Cimurs ◽

Jin Han Lee ◽

Il Hong Suh

Keyword(s):

Reinforcement Learning ◽

Obstacle Avoidance ◽

Action Space ◽

Polar Coordinates ◽

Depth Image ◽

Depth Information ◽

Continuous Action ◽

Learning Network ◽

Complex Shapes ◽

Policy Gradient

In this paper, we propose a goal-oriented obstacle avoidance navigation system based on deep reinforcement learning that uses depth information in scenes, as well as goal position in polar coordinates as state inputs. The control signals for robot motion are output in a continuous action space. We devise a deep deterministic policy gradient network with the inclusion of depth-wise separable convolution layers to process the large amounts of sequential depth image information. The goal-oriented obstacle avoidance navigation is performed without prior knowledge of the environment or a map. We show that through the proposed deep reinforcement learning network, a goal-oriented collision avoidance model can be trained end-to-end without manual tuning or supervision by a human operator. We train our model in a simulation, and the resulting network is directly transferred to other environments. Experiments show the capability of the trained network to navigate safely around obstacles and arrive at the designated goal positions in the simulation, as well as in the real world. The proposed method exhibits higher reliability than the compared approaches when navigating around obstacles with complex shapes. The experiments show that the approach is capable of avoiding not only static, but also dynamic obstacles.

Download Full-text

General Purpose Low-Level Reinforcement Learning Control for Multi-Axis Rotor Aerial Vehicles

Sensors ◽

10.3390/s21134560 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4560

Author(s):

Chen-Huan Pi ◽

Yi-Wei Dai ◽

Kai-Chun Hu ◽

Stone Cheng

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Motion Capture ◽

Flight Control ◽

Control Policy ◽

General Purpose ◽

Position Information ◽

Low Level ◽

Model Free ◽

Aerial Vehicles

This paper proposes a multipurpose reinforcement learning based low-level multirotor unmanned aerial vehicles control structure constructed using neural networks with model-free training. Other low-level reinforcement learning controllers developed in studies have only been applicable to a model-specific and physical-parameter-specific multirotor, and time-consuming training is required when switching to a different vehicle. We use a 6-degree-of-freedom dynamic model combining acceleration-based control from the policy neural network to overcome these problems. The UAV automatically learns the maneuver by an end-to-end neural network from fusion states to acceleration command. The state estimation is performed using the data from on-board sensors and motion capture. The motion capture system provides spatial position information and a multisensory fusion framework fuses the measurement from the onboard inertia measurement units for compensating the time delay and low update frequency of the capture system. Without requiring expert demonstration, the trained control policy implemented using an improved algorithm can be applied to various multirotors with the output directly mapped to actuators. The algorithm’s ability to control multirotors in the hovering and the tracking task is evaluated. Through simulation and actual experiments, we demonstrate the flight control with a quadrotor and hexrotor by using the trained policy. With the same policy, we verify that we can stabilize the quadrotor and hexrotor in the air under random initial states.

Download Full-text

Vision Based Drone Obstacle Avoidance by Deep Reinforcement Learning

AI ◽

10.3390/ai2030023 ◽

2021 ◽

Vol 2 (3) ◽

pp. 366-382

Author(s):

Zhihan Xue ◽

Tad Gonsalves

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Obstacle Avoidance ◽

Image Data ◽

Depth Map ◽

Training Environment ◽

Time To Build ◽

Discrete Action ◽

Single Dataset ◽

Action Spaces

Research on autonomous obstacle avoidance of drones has recently received widespread attention from researchers. Among them, an increasing number of researchers are using machine learning to train drones. These studies typically adopt supervised learning or reinforcement learning to train the networks. Supervised learning has a disadvantage in that it takes a significant amount of time to build the datasets, because it is difficult to cover the complex and changeable drone flight environment in a single dataset. Reinforcement learning can overcome this problem by using drones to learn data in the environment. However, the current research results based on reinforcement learning are mainly focused on discrete action spaces. In this way, the movement of drones lacks precision and has somewhat unnatural flying behavior. This study aims to use the soft-actor-critic algorithm to train a drone to perform autonomous obstacle avoidance in continuous action space using only the image data. The algorithm is trained and tested in a simulation environment built by Airsim. The results show that our algorithm enables the UAV to avoid obstacles in the training environment only by inputting the depth map. Moreover, it also has a higher obstacle avoidance rate in the reconfigured environment without retraining.

Download Full-text