scholarly journals Reinforcement Learning-Based Complete Area Coverage Path Planning for a Modified hTrihex Robot

Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1067 ◽  
Author(s):  
Koppaka Ganesh Sai Apuroop ◽  
Anh Vu Le ◽  
Mohan Rajesh Elara ◽  
Bing J. Sheu

One of the essential attributes of a cleaning robot is to achieve complete area coverage. Current commercial indoor cleaning robots have fixed morphology and are restricted to clean only specific areas in a house. The results of maximum area coverage are sub-optimal in this case. Tiling robots are innovative solutions for such a coverage problem. These new kinds of robots can be deployed in the cases of cleaning, painting, maintenance, and inspection, which require complete area coverage. Tiling robots’ objective is to cover the entire area by reconfiguring to different shapes as per the area requirements. In this context, it is vital to have a framework that enables the robot to maximize the area coverage while minimizing energy consumption. That means it is necessary for the robot to cover the maximum area with the least number of shape reconfigurations possible. The current paper proposes a complete area coverage planning module for the modified hTrihex, a honeycomb-shaped tiling robot, based on the deep reinforcement learning technique. This framework simultaneously generates the tiling shapes and the trajectory with minimum overall cost. In this regard, a convolutional neural network (CNN) with long short term memory (LSTM) layer was trained using the actor-critic experience replay (ACER) reinforcement learning algorithm. The simulation results obtained from the current implementation were compared against the results that were generated through traditional tiling theory models that included zigzag, spiral, and greedy search schemes. The model presented in the current paper was also compared against other methods where this problem was considered as a traveling salesman problem (TSP) solved through genetic algorithm (GA) and ant colony optimization (ACO) approaches. Our proposed scheme generates a path with a minimized cost at a lesser time.

Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2577 ◽  
Author(s):  
Anh Vu Le ◽  
Prabakaran Veerajagadheswar ◽  
Phone Thiha Kyaw ◽  
Mohan Rajesh Elara ◽  
Nguyen Huu Khanh Nhan

One of the critical challenges in deploying the cleaning robots is the completion of covering the entire area. Current tiling robots for area coverage have fixed forms and are limited to cleaning only certain areas. The reconfigurable system is the creative answer to such an optimal coverage problem. The tiling robot’s goal enables the complete coverage of the entire area by reconfiguring to different shapes according to the area’s needs. In the particular sequencing of navigation, it is essential to have a structure that allows the robot to extend the coverage range while saving energy usage during navigation. This implies that the robot is able to cover larger areas entirely with the least required actions. This paper presents a complete path planning (CPP) for hTetran, a polyabolo tiled robot, based on a TSP-based reinforcement learning optimization. This structure simultaneously produces robot shapes and sequential trajectories whilst maximizing the reward of the trained reinforcement learning (RL) model within the predefined polyabolo-based tileset. To this end, a reinforcement learning-based travel sales problem (TSP) with proximal policy optimization (PPO) algorithm was trained using the complementary learning computation of the TSP sequencing. The reconstructive results of the proposed RL-TSP-based CPP for hTetran were compared in terms of energy and time spent with the conventional tiled hypothetical models that incorporate TSP solved through an evolutionary based ant colony optimization (ACO) approach. The CPP demonstrates an ability to generate an ideal Pareto optima trajectory that enhances the robot’s navigation inside the real environment with the least energy and time spent in the company of conventional techniques.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Yu Zhao ◽  
Jifeng Guo ◽  
Chengchao Bai ◽  
Hongxing Zheng

A deep reinforcement learning-based computational guidance method is presented, which is used to identify and resolve the problem of collision avoidance for a variable number of fixed-wing UAVs in limited airspace. The cooperative guidance process is first analyzed for multiple aircraft by formulating flight scenarios using multiagent Markov game theory and solving it by machine learning algorithm. Furthermore, a self-learning framework is established by using the actor-critic model, which is proposed to train collision avoidance decision-making neural networks. To achieve higher scalability, the neural network is customized to incorporate long short-term memory networks, and a coordination strategy is given. Additionally, a simulator suitable for multiagent high-density route scene is designed for validation, in which all UAVs run the proposed algorithm onboard. Simulated experiment results from several case studies show that the real-time guidance algorithm can reduce the collision probability of multiple UAVs in flight effectively even with a large number of aircraft.


Author(s):  
Abdelhadi Larach ◽  
Cherki Daoui ◽  
Mohamed Baslam

A review of literature shows that there is a variety of works studying coverage path planning in several autonomous robotic applications. In this work, we propose a new approach using Markov Decision Process to plan an optimum path to reach the general goal of exploring an unknown environment containing buried mines. This approach, called Goals to Goals Area Coverage on-line Algorithm, is based on a decomposition of the state space into smaller regions whose states are considered as goals with the same reward value, the reward value is decremented from one region to another according to the desired search mode. The numerical simulations show that our approach is promising for minimizing the necessary cost-energy to cover the entire area.


Author(s):  
S. M. Bhagya P. Samarakoon ◽  
M. A. Viraj J. Muthugala ◽  
Anh Vu Le ◽  
Mohan Rajesh Elara

AbstractComplete area coverage is a crucial factor for a floor cleaning robot. Self-reconfigurable tiling robots have been introduced over robots with a fixed shape for floor cleaning since they improve the area coverage by the flexibility of shape-shifting in cluttered environments. The existing coverage methods of reconfigurable tiling robots follow the tiling theory to cope with the area coverage problem. However, these methods merely consider a limited set of predefined shapes for the reconfiguration of a robot. The consideration of a limited set of predefined shapes for the reconfiguration impedes the ability of coverage to a certain extent in typical floor environments. Therefore, this paper proposes a novel method to improve area coverage of a tiling robot by reconfiguring according to the shape of obstacles. To this end, the required hinge angles for reconfiguring per the shape of an obstacle are determined by a genetic algorithm. The proposed method considers an optimized shape for reconfiguration in lieu of a limited set of predefined shapes. The coverage improvement of the proposed concept has been compared against the existing coverage methods of tiling robots to validate the performance. According to the experimental results, the proposed method surpasses the existing coverage methods of tiling robots from the perspective of area coverage, and the improvement is significant and noteworthy.


2019 ◽  
Author(s):  
Niclas Ståhl ◽  
Göran Falkman ◽  
Alexander Karlsson ◽  
Gunnar Mathiason ◽  
Jonas Boström

<p>In medicinal chemistry programs it is key to design and make compounds that are efficacious and safe. This is a long, complex and difficult multi-parameter optimization process, often including several properties with orthogonal trends. New methods for the automated design of compounds against profiles of multiple properties are thus of great value. Here we present a fragment-based reinforcement learning approach based on an actor-critic model, for the generation of novel molecules with optimal properties. The actor and the critic are both modelled with bidirectional long short-term memory (LSTM) networks. The AI method learns how to generate new compounds with desired properties by starting from an initial set of lead molecules and then improve these by replacing some of their fragments. A balanced binary tree based on the similarity of fragments is used in the generative process to bias the output towards structurally similar molecules. The method is demonstrated by a case study showing that 93% of the generated molecules are chemically valid, and a third satisfy the targeted objectives, while there were none in the initial set.</p>


Symmetry ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 471
Author(s):  
Jai Hoon Park ◽  
Kang Hoon Lee

Designing novel robots that can cope with a specific task is a challenging problem because of the enormous design space that involves both morphological structures and control mechanisms. To this end, we present a computational method for automating the design of modular robots. Our method employs a genetic algorithm to evolve robotic structures as an outer optimization, and it applies a reinforcement learning algorithm to each candidate structure to train its behavior and evaluate its potential learning ability as an inner optimization. The size of the design space is reduced significantly by evolving only the robotic structure and by performing behavioral optimization using a separate training algorithm compared to that when both the structure and behavior are evolved simultaneously. Mutual dependence between evolution and learning is achieved by regarding the mean cumulative rewards of a candidate structure in the reinforcement learning as its fitness in the genetic algorithm. Therefore, our method searches for prospective robotic structures that can potentially lead to near-optimal behaviors if trained sufficiently. We demonstrate the usefulness of our method through several effective design results that were automatically generated in the process of experimenting with actual modular robotics kit.


Sign in / Sign up

Export Citation Format

Share Document