Reinforcement Learning-Based Complete Area Coverage Path Planning for a Modified hTrihex Robot

Koppaka Ganesh Sai Apuroop; Anh Vu Le; Mohan Rajesh Elara; Bing J. Sheu

doi:10.3390/s21041067

Reinforcement Learning-Based Complete Area Coverage Path Planning for a Modified hTrihex Robot

Sensors ◽

10.3390/s21041067 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1067 ◽

Cited By ~ 2

Author(s):

Koppaka Ganesh Sai Apuroop ◽

Anh Vu Le ◽

Mohan Rajesh Elara ◽

Bing J. Sheu

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

Learning Algorithm ◽

Area Coverage ◽

Current Paper ◽

Coverage Problem ◽

Entire Area ◽

Maximum Area ◽

Coverage Path Planning ◽

Cleaning Robot

One of the essential attributes of a cleaning robot is to achieve complete area coverage. Current commercial indoor cleaning robots have fixed morphology and are restricted to clean only specific areas in a house. The results of maximum area coverage are sub-optimal in this case. Tiling robots are innovative solutions for such a coverage problem. These new kinds of robots can be deployed in the cases of cleaning, painting, maintenance, and inspection, which require complete area coverage. Tiling robots’ objective is to cover the entire area by reconfiguring to different shapes as per the area requirements. In this context, it is vital to have a framework that enables the robot to maximize the area coverage while minimizing energy consumption. That means it is necessary for the robot to cover the maximum area with the least number of shape reconfigurations possible. The current paper proposes a complete area coverage planning module for the modified hTrihex, a honeycomb-shaped tiling robot, based on the deep reinforcement learning technique. This framework simultaneously generates the tiling shapes and the trajectory with minimum overall cost. In this regard, a convolutional neural network (CNN) with long short term memory (LSTM) layer was trained using the actor-critic experience replay (ACER) reinforcement learning algorithm. The simulation results obtained from the current implementation were compared against the results that were generated through traditional tiling theory models that included zigzag, spiral, and greedy search schemes. The model presented in the current paper was also compared against other methods where this problem was considered as a traveling salesman problem (TSP) solved through genetic algorithm (GA) and ant colony optimization (ACO) approaches. Our proposed scheme generates a path with a minimized cost at a lesser time.

Download Full-text

Coverage Path Planning Using Reinforcement Learning-Based TSP for hTetran—A Polyabolo-Inspired Self-Reconfigurable Tiling Robot

Sensors ◽

10.3390/s21082577 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2577 ◽

Cited By ~ 1

Author(s):

Anh Vu Le ◽

Prabakaran Veerajagadheswar ◽

Phone Thiha Kyaw ◽

Mohan Rajesh Elara ◽

Nguyen Huu Khanh Nhan

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Coverage Problem ◽

Pareto Optima ◽

Entire Area ◽

Coverage Path Planning ◽

Different Shapes ◽

Cleaning Robots ◽

Policy Optimization ◽

Least Energy

One of the critical challenges in deploying the cleaning robots is the completion of covering the entire area. Current tiling robots for area coverage have fixed forms and are limited to cleaning only certain areas. The reconfigurable system is the creative answer to such an optimal coverage problem. The tiling robot’s goal enables the complete coverage of the entire area by reconfiguring to different shapes according to the area’s needs. In the particular sequencing of navigation, it is essential to have a structure that allows the robot to extend the coverage range while saving energy usage during navigation. This implies that the robot is able to cover larger areas entirely with the least required actions. This paper presents a complete path planning (CPP) for hTetran, a polyabolo tiled robot, based on a TSP-based reinforcement learning optimization. This structure simultaneously produces robot shapes and sequential trajectories whilst maximizing the reward of the trained reinforcement learning (RL) model within the predefined polyabolo-based tileset. To this end, a reinforcement learning-based travel sales problem (TSP) with proximal policy optimization (PPO) algorithm was trained using the complementary learning computation of the TSP sequencing. The reconstructive results of the proposed RL-TSP-based CPP for hTetran were compared in terms of energy and time spent with the conventional tiled hypothetical models that incorporate TSP solved through an evolutionary based ant colony optimization (ACO) approach. The CPP demonstrates an ability to generate an ideal Pareto optima trajectory that enhances the robot’s navigation inside the real environment with the least energy and time spent in the company of conventional techniques.

Download Full-text

Reinforcement Learning-Based Collision Avoidance Guidance Algorithm for Fixed-Wing UAVs

Complexity ◽

10.1155/2021/8818013 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Yu Zhao ◽

Jifeng Guo ◽

Chengchao Bai ◽

Hongxing Zheng

Keyword(s):

Reinforcement Learning ◽

Collision Avoidance ◽

Short Term Memory ◽

Learning Algorithm ◽

Variable Number ◽

Collision Probability ◽

Learning Framework ◽

Markov Game ◽

The Neural Network ◽

Guidance Algorithm

A deep reinforcement learning-based computational guidance method is presented, which is used to identify and resolve the problem of collision avoidance for a variable number of fixed-wing UAVs in limited airspace. The cooperative guidance process is first analyzed for multiple aircraft by formulating flight scenarios using multiagent Markov game theory and solving it by machine learning algorithm. Furthermore, a self-learning framework is established by using the actor-critic model, which is proposed to train collision avoidance decision-making neural networks. To achieve higher scalability, the neural network is customized to incorporate long short-term memory networks, and a coordination strategy is given. Additionally, a simulator suitable for multiagent high-density route scene is designed for validation, in which all UAVs run the proposed algorithm onboard. Simulated experiment results from several case studies show that the real-time guidance algorithm can reduce the collision probability of multiple UAVs in flight effectively even with a large number of aircraft.

Download Full-text

A Markov Decision Model for Area Coverage in Autonomous Demining Robot

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v6i2.pp105-116 ◽

2017 ◽

Vol 6 (2) ◽

pp. 105

Author(s):

Abdelhadi Larach ◽

Cherki Daoui ◽

Mohamed Baslam

Keyword(s):

Area Coverage ◽

Entire Area ◽

New Approach ◽

Coverage Path Planning ◽

Reward Value ◽

Markov Decision ◽

On Line ◽

Markov Decision Model ◽

General Goal ◽

Robotic Applications

A review of literature shows that there is a variety of works studying coverage path planning in several autonomous robotic applications. In this work, we propose a new approach using Markov Decision Process to plan an optimum path to reach the general goal of exploring an unknown environment containing buried mines. This approach, called Goals to Goals Area Coverage on-line Algorithm, is based on a decomposition of the state space into smaller regions whose states are considered as goals with the same reward value, the reward value is decremented from one region to another according to the desired search mode. The numerical simulations show that our approach is promising for minimizing the necessary cost-energy to cover the entire area.

Download Full-text

Toward complete area coverage of a reconfigurable tiling robot by following obstacle shape

Complex & Intelligent Systems ◽

10.1007/s40747-020-00243-3 ◽

2021 ◽

Author(s):

S. M. Bhagya P. Samarakoon ◽

M. A. Viraj J. Muthugala ◽

Anh Vu Le ◽

Mohan Rajesh Elara

Keyword(s):

Genetic Algorithm ◽

Area Coverage ◽

Experimental Results ◽

Coverage Problem ◽

Cluttered Environments ◽

Cleaning Robot ◽

Novel Method

AbstractComplete area coverage is a crucial factor for a floor cleaning robot. Self-reconfigurable tiling robots have been introduced over robots with a fixed shape for floor cleaning since they improve the area coverage by the flexibility of shape-shifting in cluttered environments. The existing coverage methods of reconfigurable tiling robots follow the tiling theory to cope with the area coverage problem. However, these methods merely consider a limited set of predefined shapes for the reconfiguration of a robot. The consideration of a limited set of predefined shapes for the reconfiguration impedes the ability of coverage to a certain extent in typical floor environments. Therefore, this paper proposes a novel method to improve area coverage of a tiling robot by reconfiguring according to the shape of obstacles. To this end, the required hinge angles for reconfiguring per the shape of an obstacle are determined by a genetic algorithm. The proposed method considers an optimized shape for reconfiguration in lieu of a limited set of predefined shapes. The coverage improvement of the proposed concept has been compared against the existing coverage methods of tiling robots to validate the performance. According to the experimental results, the proposed method surpasses the existing coverage methods of tiling robots from the perspective of area coverage, and the improvement is significant and noteworthy.

Download Full-text

Model dependent reinforcement learning algorithm for reservoir operation stochastic optimization

International Journal of Hydrology ◽

10.15406/ijh.2018.02.00129 ◽

2018 ◽

Vol 2 (5) ◽

Author(s):

Li Wenwu

Keyword(s):

Reinforcement Learning ◽

Stochastic Optimization ◽

Reservoir Operation ◽

Learning Algorithm ◽

Reinforcement Learning Algorithm

Download Full-text

Reinforcement learning algorithm for one-warehouse multi-retailer inventory problem

Automation, Mechanical and Electrical Engineering ◽

10.2495/amee140161 ◽

2014 ◽

Author(s):

C.Y. Li ◽

X.T. Wang ◽

T.W. Zhang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Inventory Problem ◽

Reinforcement Learning Algorithm

Download Full-text

Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design

10.26434/chemrxiv.7990910.v2 ◽

2019 ◽

Author(s):

Niclas Ståhl ◽

Göran Falkman ◽

Alexander Karlsson ◽

Gunnar Mathiason ◽

Jonas Boström

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

De Novo ◽

De Novo Drug Design ◽

Generative Process ◽

New Methods ◽

Multiparameter Optimization ◽

Long Short Term Memory ◽

New Compounds

<p>In medicinal chemistry programs it is key to design and make compounds that are efficacious and safe. This is a long, complex and difficult multi-parameter optimization process, often including several properties with orthogonal trends. New methods for the automated design of compounds against profiles of multiple properties are thus of great value. Here we present a fragment-based reinforcement learning approach based on an actor-critic model, for the generation of novel molecules with optimal properties. The actor and the critic are both modelled with bidirectional long short-term memory (LSTM) networks. The AI method learns how to generate new compounds with desired properties by starting from an initial set of lead molecules and then improve these by replacing some of their fragments. A balanced binary tree based on the similarity of fragments is used in the generative process to bias the output towards structurally similar molecules. The method is demonstrated by a case study showing that 93% of the generated molecules are chemically valid, and a third satisfy the targeted objectives, while there were none in the initial set.</p>

Download Full-text

Computational Design of Modular Robots Based on Genetic Algorithm and Reinforcement Learning

Symmetry ◽

10.3390/sym13030471 ◽

2021 ◽

Vol 13 (3) ◽

pp. 471

Author(s):

Jai Hoon Park ◽

Kang Hoon Lee

Keyword(s):

Genetic Algorithm ◽

Reinforcement Learning ◽

Design Space ◽

Learning Algorithm ◽

Computational Design ◽

Computational Method ◽

Learning Ability ◽

Modular Robots ◽

Control Mechanisms ◽

Candidate Structure

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.

Download Full-text

Intelligent Energy Management Strategy Based on an Improved Reinforcement Learning Algorithm With Exploration Factor for a Plug-in PHEV

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2021.3085710 ◽

2021 ◽

pp. 1-11

Author(s):

Xinyou Lin ◽

Kuncheng Zhou ◽

Liping Mo ◽

Hailin Li

Keyword(s):

Reinforcement Learning ◽

Energy Management ◽

Management Strategy ◽

Learning Algorithm ◽

Energy Management Strategy ◽

Reinforcement Learning Algorithm

Download Full-text

UAV Coverage Path Planning under Varying Power Constraints using Deep Reinforcement Learning

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9340934 ◽

2020 ◽

Author(s):

Mirco Theile ◽

Harald Bayerlein ◽

Richard Nai ◽

David Gesbert ◽

Marco Caccamo

Keyword(s):

Reinforcement Learning ◽

Path Planning ◽

Coverage Path Planning ◽

Power Constraints

Download Full-text