scholarly journals Cooperative Object Transportation Using Curriculum-Based Deep Reinforcement Learning

Sensors ◽  
2021 ◽  
Vol 21 (14) ◽  
pp. 4780
Author(s):  
Gyuho Eoh ◽  
Tae-Hyoung Park

This paper presents a cooperative object transportation technique using deep reinforcement learning (DRL) based on curricula. Previous studies on object transportation highly depended on complex and intractable controls, such as grasping, pushing, and caging. Recently, DRL-based object transportation techniques have been proposed, which showed improved performance without precise controller design. However, DRL-based techniques not only take a long time to learn their policies but also sometimes fail to learn. It is difficult to learn the policy of DRL by random actions only. Therefore, we propose two curricula for the efficient learning of object transportation: region-growing and single- to multi-robot. During the learning process, the region-growing curriculum gradually extended to a region in which an object was initialized. This step-by-step learning raised the success probability of object transportation by restricting the working area. Multiple robots could easily learn a new policy by exploiting the pre-trained policy of a single robot. This single- to multi-robot curriculum can help robots to learn a transporting method with trial and error. Simulation results are presented to verify the proposed techniques.

2021 ◽  
Vol 11 (2) ◽  
pp. 546
Author(s):  
Jiajia Xie ◽  
Rui Zhou ◽  
Yuan Liu ◽  
Jun Luo ◽  
Shaorong Xie ◽  
...  

The high performance and efficiency of multiple unmanned surface vehicles (multi-USV) promote the further civilian and military applications of coordinated USV. As the basis of multiple USVs’ cooperative work, considerable attention has been spent on developing the decentralized formation control of the USV swarm. Formation control of multiple USV belongs to the geometric problems of a multi-robot system. The main challenge is the way to generate and maintain the formation of a multi-robot system. The rapid development of reinforcement learning provides us with a new solution to deal with these problems. In this paper, we introduce a decentralized structure of the multi-USV system and employ reinforcement learning to deal with the formation control of a multi-USV system in a leader–follower topology. Therefore, we propose an asynchronous decentralized formation control scheme based on reinforcement learning for multiple USVs. First, a simplified USV model is established. Simultaneously, the formation shape model is built to provide formation parameters and to describe the physical relationship between USVs. Second, the advantage deep deterministic policy gradient algorithm (ADDPG) is proposed. Third, formation generation policies and formation maintenance policies based on the ADDPG are proposed to form and maintain the given geometry structure of the team of USVs during movement. Moreover, three new reward functions are designed and utilized to promote policy learning. Finally, various experiments are conducted to validate the performance of the proposed formation control scheme. Simulation results and contrast experiments demonstrate the efficiency and stability of the formation control scheme.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter Morales ◽  
Rajmonda Sulo Caceres ◽  
Tina Eliassi-Rad

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.


Author(s):  
Sadek Belamfedel Alaoui ◽  
El Houssaine Tissir ◽  
Noreddine Chaibi ◽  
Fatima El Haoussi

Designing robust active queue management subjected to network imperfections is a challenging problem. Motivated by this topic, we addressed the problem of controller design for linear systems with variable delay and unsymmetrical constraints by the scaled small gain theorem. We designed two mechanisms: robust enhanced proportional derivative; and robust enhanced proportional derivative subjected to input saturation. Discussion of their practical implementations along with extensive comparisons by MATLAB and NS3 illustrate the improved performance and the enlargement of the domain of attraction regarding some literature results.


2021 ◽  
Author(s):  
Tuffa Said ◽  
Jeffery Wolbert ◽  
Siavash Khodadadeh ◽  
Ayan Dutta ◽  
O. Patrick Kreidl ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document