scholarly journals MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments

Electronics ◽  
2021 ◽  
Vol 10 (22) ◽  
pp. 2751
Author(s):  
Dimitrios I. Koutras ◽  
Athanasios C. Kapoutsis ◽  
Angelos A. Amanatiadis ◽  
Elias B. Kosmatopoulos

This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently.

Algorithms ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 226
Author(s):  
Wenzel Pilar von Pilchau ◽  
Anthony Stein ◽  
Jörg Hähner

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.


1984 ◽  
Vol 58 (2) ◽  
pp. 419-425 ◽  
Author(s):  
Bruce A. Thyer ◽  
Sadi Irvine ◽  
Cathleen A. Santa

Research on the control and maintenance of exercise by chronic schizophrenics has been relatively neglected, despite widespread knowledge of the adverse consequences of sedentary living. The present study evaluated the effects of simple contingency-management procedures designed to encourage exercising by two psychotic residents living in a sheltered group home. For both subjects, the ABAB experimental analysis demonstrated the effectiveness of the intervention. Improved levels of exercise were maintained at follow-up and the contingency-management system was implemented as a regular part of the group home's rehabilitation program for all residents.


Author(s):  
David Massimo ◽  
Francesco Ricci

AbstractRecommender Systems (RSs) are often assessed in off-line settings by measuring the system precision in predicting the observed user’s ratings or choices. But, when a precise RS is on-line, the generated recommendations can be perceived as marginally useful because lacking novelty. The underlying problem is that it is hard to build an RS that can correctly generalise, from the analysis of user’s observed behaviour, and can identify the essential characteristics of novel and yet relevant recommendations. In this paper we address the above mentioned issue by considering four RSs that try to excel on different target criteria: precision, relevance and novelty. Two state of the art RSs called and follow a classical Nearest Neighbour approach, while the other two, and are based on Inverse Reinforcement Learning. and optimise precision, tries to identify the characteristics of POIs that make them relevant, and , a novel RS here introduced, is similar to but it also tries to recommend popular POIs. In an off-line experiment we discover that the recommendations produced by and optimise precision essentially by recommending quite popular POIs. can be tuned to achieve a desired level of precision at the cost of losing part of the best capability of to generate novel and yet relevant recommendations. In the on-line study we discover that the recommendations of and are liked more than those produced by . The rationale of that was found in the large percentage of novel recommendations produced by , which are difficult to appreciate. However, excels in recommending items that are both novel and liked by the users.


Science ◽  
2019 ◽  
Vol 364 (6443) ◽  
pp. 859-865 ◽  
Author(s):  
Max Jaderberg ◽  
Wojciech M. Czarnecki ◽  
Iain Dunning ◽  
Luke Marris ◽  
Guy Lever ◽  
...  

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.


Author(s):  
Fabio Burlon ◽  
Diego Micheli ◽  
Michele Simonato ◽  
Riccardo Furlanetto

Pumps used in professional appliances process a solution of water, soils residues and detergents. These affect vapor tension, viscosity and rheology of the solution, mainly due to the presence of surfactants and polymers. Only a few studies have been found on how these substances can influence pump performances. Therefore, an experimental analysis has been carried out with aqueous solutions of a detergent component, the Polyox WSR 301, in the concentration range of 100–7000 ppm, to evaluate their influence on pump performances and cavitation. Some properties of the solutions have been preliminary characterized with a rheometer. Then, each solution has been tested in a dedicated test rig, to compare the performance curves of a centrifugal pump used in professional warewashing machines with those obtained with pure water. A non-intrusive method, based on the investigation of high frequency vibrations and noise signals, has been developed to detect cavitation at its early stage of inception. It was observed that polymer mitigates cavitating pump vibrations, with a reduction of the acceleration to less than one g. The analysis has provided the data necessary for the successive development of a control strategy for pump operation in professional appliances.


Sign in / Sign up

Export Citation Format

Share Document