MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments

Dimitrios I. Koutras; Athanasios C. Kapoutsis; Angelos A. Amanatiadis; Elias B. Kosmatopoulos

doi:10.3390/electronics10222751

MarsExplorer: Exploration of Unknown Terrains via Deep Reinforcement Learning and Procedurally Generated Environments

Electronics ◽

10.3390/electronics10222751 ◽

2021 ◽

Vol 10 (22) ◽

pp. 2751

Author(s):

Dimitrios I. Koutras ◽

Athanasios C. Kapoutsis ◽

Angelos A. Amanatiadis ◽

Elias B. Kosmatopoulos

Keyword(s):

Reinforcement Learning ◽

Experimental Analysis ◽

Hilbert Curve ◽

Procedural Generation ◽

Performance Curves ◽

Learning Capabilities ◽

Level Performance ◽

Adaptation Phase ◽

Core Features

This paper is an initial endeavor to bridge the gap between powerful Deep Reinforcement Learning methodologies and the problem of exploration/coverage of unknown terrains. Within this scope, MarsExplorer, an openai-gym compatible environment tailored to exploration/coverage of unknown areas, is presented. MarsExplorer translates the original robotics problem into a Reinforcement Learning setup that various off-the-shelf algorithms can tackle. Any learned policy can be straightforwardly applied to a robotic platform without an elaborate simulation model of the robot’s dynamics to apply a different learning/adaptation phase. One of its core features is the controllable multi-dimensional procedural generation of terrains, which is the key for producing policies with strong generalization capabilities. Four different state-of-the-art RL algorithms (A3C, PPO, Rainbow, and SAC) are trained on the MarsExplorer environment, and a proper evaluation of their results compared to the average human-level performance is reported. In the follow-up experimental analysis, the effect of the multi-dimensional difficulty setting on the learning capabilities of the best-performing algorithm (PPO) is analyzed. A milestone result is the generation of an exploration policy that follows the Hilbert curve without providing this information to the environment or rewarding directly or indirectly Hilbert-curve-like trajectories. The experimental analysis is concluded by evaluating PPO learned policy algorithm side-by-side with frontier-based exploration strategies. A study on the performance curves revealed that PPO-based policy was capable of performing adaptive-to-the-unknown-terrain sweeping without leaving expensive-to-revisit areas uncovered, underlying the capability of RL-based methodologies to tackle exploration tasks efficiently.

Download Full-text

Experimental analysis of different mail follow-up techniques within a national sample of hard-core nonrespondents

PsycEXTRA Dataset ◽

10.1037/e611302012-156 ◽

1967 ◽

Author(s):

Robert J. Panos ◽

Janet E. Rice

Keyword(s):

Experimental Analysis ◽

Hard Core ◽

National Sample

Download Full-text

Synthetic Experiences for Accelerating DQN Performance in Discrete Non-Deterministic Environments

Algorithms ◽

10.3390/a14080226 ◽

2021 ◽

Vol 14 (8) ◽

pp. 226

Author(s):

Wenzel Pilar von Pilchau ◽

Anthony Stein ◽

Jörg Hähner

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Weighted Average ◽

Up States ◽

Experience Replay

State-of-the-art Deep Reinforcement Learning Algorithms such as DQN and DDPG use the concept of a replay buffer called Experience Replay. The default usage contains only the experiences that have been gathered over the runtime. We propose a method called Interpolated Experience Replay that uses stored (real) transitions to create synthetic ones to assist the learner. In this first approach to this field, we limit ourselves to discrete and non-deterministic environments and use a simple equally weighted average of the reward in combination with observed follow-up states. We could demonstrate a significantly improved overall mean average in comparison to a DQN network with vanilla Experience Replay on the discrete and non-deterministic FrozenLake8x8-v0 environment.

Download Full-text

Comparing strategies for modeling students learning styles through reinforcement learning in adaptive and intelligent educational systems: An experimental analysis

Expert Systems with Applications ◽

10.1016/j.eswa.2012.10.014 ◽

2013 ◽

Vol 40 (6) ◽

pp. 2092-2101 ◽

Cited By ~ 43

Author(s):

Fabiano A. Dorça ◽

Luciano V. Lima ◽

Márcia A. Fernandes ◽

Carlos R. Lopes

Keyword(s):

Reinforcement Learning ◽

Learning Styles ◽

Experimental Analysis ◽

Educational Systems

Download Full-text

Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory

Energy and Buildings ◽

10.1016/j.enbuild.2005.06.002 ◽

2006 ◽

Vol 38 (2) ◽

pp. 142-147 ◽

Cited By ~ 59

Author(s):

Simeng Liu ◽

Gregor P. Henze

Keyword(s):

Reinforcement Learning ◽

Experimental Analysis ◽

Thermal Storage ◽

Learning Control

Download Full-text

Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar

2020 IEEE International Radar Conference (RADAR) ◽

10.1109/radar42522.2020.9114698 ◽

2020 ◽

Author(s):

Charles E. Thornton ◽

R. Michael Buehrer ◽

Anthony F. Martone ◽

Kelly D. Sherbondy

Keyword(s):

Reinforcement Learning ◽

Experimental Analysis ◽

Spectrum Sharing ◽

Learning Techniques

Download Full-text

Contingency Management of Exercise by Chronic Schizophrenics

Perceptual and Motor Skills ◽

10.2466/pms.1984.58.2.419 ◽

1984 ◽

Vol 58 (2) ◽

pp. 419-425 ◽

Cited By ~ 15

Author(s):

Bruce A. Thyer ◽

Sadi Irvine ◽

Cathleen A. Santa

Keyword(s):

Experimental Analysis ◽

Management System ◽

Group Home ◽

Contingency Management ◽

Rehabilitation Program ◽

Regular Part ◽

Chronic Schizophrenics ◽

Management Procedures

Research on the control and maintenance of exercise by chronic schizophrenics has been relatively neglected, despite widespread knowledge of the adverse consequences of sedentary living. The present study evaluated the effects of simple contingency-management procedures designed to encourage exercising by two psychotic residents living in a sheltered group home. For both subjects, the ABAB experimental analysis demonstrated the effectiveness of the intervention. Improved levels of exercise were maintained at follow-up and the contingency-management system was implemented as a regular part of the group home's rehabilitation program for all residents.

Download Full-text

Experimental analysis of simulated reinforcement learning control for active and passive building thermal storage inventory

Energy and Buildings ◽

10.1016/j.enbuild.2005.06.001 ◽

2006 ◽

Vol 38 (2) ◽

pp. 148-161 ◽

Cited By ~ 41

Author(s):

Simeng Liu ◽

Gregor P. Henze

Keyword(s):

Reinforcement Learning ◽

Experimental Analysis ◽

Thermal Storage ◽

Learning Control

Download Full-text

Popularity, novelty and relevance in point of interest recommendation: an experimental analysis

Information Technology & Tourism ◽

10.1007/s40558-021-00214-5 ◽

2021 ◽

Author(s):

David Massimo ◽

Francesco Ricci

Keyword(s):

Reinforcement Learning ◽

Experimental Analysis ◽

State Of The Art ◽

The Other ◽

Nearest Neighbour ◽

Inverse Reinforcement Learning ◽

Point Of Interest ◽

On Line ◽

Observed Behaviour ◽

The Cost

AbstractRecommender Systems (RSs) are often assessed in off-line settings by measuring the system precision in predicting the observed user’s ratings or choices. But, when a precise RS is on-line, the generated recommendations can be perceived as marginally useful because lacking novelty. The underlying problem is that it is hard to build an RS that can correctly generalise, from the analysis of user’s observed behaviour, and can identify the essential characteristics of novel and yet relevant recommendations. In this paper we address the above mentioned issue by considering four RSs that try to excel on different target criteria: precision, relevance and novelty. Two state of the art RSs called and follow a classical Nearest Neighbour approach, while the other two, and are based on Inverse Reinforcement Learning. and optimise precision, tries to identify the characteristics of POIs that make them relevant, and , a novel RS here introduced, is similar to but it also tries to recommend popular POIs. In an off-line experiment we discover that the recommendations produced by and optimise precision essentially by recommending quite popular POIs. can be tuned to achieve a desired level of precision at the cost of losing part of the best capability of to generate novel and yet relevant recommendations. In the on-line study we discover that the recommendations of and are liked more than those produced by . The rationale of that was found in the large percentage of novel recommendations produced by , which are difficult to appreciate. However, excels in recommending items that are both novel and liked by the users.

Download Full-text

Human-level performance in 3D multiplayer games with population-based reinforcement learning

Science ◽

10.1126/science.aau6249 ◽

2019 ◽

Vol 364 (6443) ◽

pp. 859-865 ◽

Cited By ~ 51

Author(s):

Max Jaderberg ◽

Wojciech M. Czarnecki ◽

Iain Dunning ◽

Luke Marris ◽

Guy Lever ◽

...

Keyword(s):

Reinforcement Learning ◽

Video Game ◽

Single Agent ◽

Three Dimensional ◽

Population Based ◽

Great Success ◽

Multiplayer Games ◽

Multiagent Reinforcement Learning ◽

The World ◽

Level Performance

Reinforcement learning (RL) has shown great success in increasingly complex single-agent environments and two-player turn-based games. However, the real world contains multiple agents, each learning and acting independently to cooperate and compete with other agents. We used a tournament-style evaluation to demonstrate that an agent can achieve human-level performance in a three-dimensional multiplayer first-person video game, Quake III Arena in Capture the Flag mode, using only pixels and game points scored as input. We used a two-tier optimization process in which a population of independent RL agents are trained concurrently from thousands of parallel matches on randomly generated environments. Each agent learns its own internal reward signal and rich representation of the world. These results indicate the great potential of multiagent reinforcement learning for artificial intelligence research.

Download Full-text

Experimental analysis of the influence of polymer solutions on performances and cavitation of small size pumps for professional appliances

Proceedings of the Institution of Mechanical Engineers Part E Journal of Process Mechanical Engineering ◽

10.1177/0954408920938192 ◽

2020 ◽

pp. 095440892093819

Author(s):

Fabio Burlon ◽

Diego Micheli ◽

Michele Simonato ◽

Riccardo Furlanetto

Keyword(s):

Aqueous Solutions ◽

Control Strategy ◽

Experimental Analysis ◽

High Frequency ◽

Early Stage ◽

Pure Water ◽

Test Rig ◽

Pump Operation ◽

Performance Curves ◽

High Frequency Vibrations

Pumps used in professional appliances process a solution of water, soils residues and detergents. These affect vapor tension, viscosity and rheology of the solution, mainly due to the presence of surfactants and polymers. Only a few studies have been found on how these substances can influence pump performances. Therefore, an experimental analysis has been carried out with aqueous solutions of a detergent component, the Polyox WSR 301, in the concentration range of 100–7000 ppm, to evaluate their influence on pump performances and cavitation. Some properties of the solutions have been preliminary characterized with a rheometer. Then, each solution has been tested in a dedicated test rig, to compare the performance curves of a centrifugal pump used in professional warewashing machines with those obtained with pure water. A non-intrusive method, based on the investigation of high frequency vibrations and noise signals, has been developed to detect cavitation at its early stage of inception. It was observed that polymer mitigates cavitating pump vibrations, with a reduction of the acceleration to less than one g. The analysis has provided the data necessary for the successive development of a control strategy for pump operation in professional appliances.

Download Full-text