Quadrupedal robots trot into the wild

Sehoon Ha

doi:10.1126/scirobotics.abe5218

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore–exploit dilemma

Nature Communications ◽

10.1038/s41467-020-18864-0 ◽

2020 ◽

Vol 11 (1) ◽

Author(s):

Alexandre Y. Dombrovski ◽

Beatriz Luna ◽

Michael N. Hallquist

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Differential Reinforcement ◽

Cognitive Maps ◽

Learning Task ◽

Natural Environments ◽

Reward Prediction Error ◽

Reward Function ◽

Reward Prediction ◽

Reward Information

Abstract When making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Here we report that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation on a reinforcement learning task with a spatially structured reward function. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.

Download Full-text

On the Technological Instantiation of a Biomimetic Leg Concept for Agile Quadrupedal Locomotion

Journal of Mechanisms and Robotics ◽

10.1115/1.4028306 ◽

2015 ◽

Vol 7 (3) ◽

Cited By ~ 3

Author(s):

Elena Garcia ◽

Juan C. Arevalo ◽

Manuel Cestari ◽

Daniel Sanz-Merodio

Keyword(s):

Experimental Evaluation ◽

Complex Terrain ◽

Legged Locomotion ◽

Natural Environments ◽

Quadrupedal Locomotion ◽

Optimum Performance ◽

Natural Complex ◽

Quadruped Robots ◽

Locomotion System ◽

Leg Design

The legged locomotion system of biological quadrupeds has proven to be the most efficient in natural, complex terrain. Particularly, horses' legs have been evolved to provide speed, endurance, and strength superior to any other animal of equal size. Quadruped robots, emulating their biological counterparts, could become the best choice for field missions in complex or natural environments; however, they should be provided with optimum performance against mobility, payload, and endurance. The design of the leg mechanism is of paramount importance to achieve the targeted performance, and in order to design a leg mechanism able to provide the robot with such agile capabilities nature is the best source for inspiration. In this work, key principles underlying horse legs' power capabilities have been extracted and translated to a biomimetic leg concept. Afterwards, a real prototype has been designed following the biomimetic concept proposed. A key element in the biomimetic concept is the multifunctionality of the natural musculotendinous system, which has been mimicked by combining series elastic actuation and passive elements. This work provides an assessment of the benefits that bio-inspired solutions can provide versus the purely engineering approaches. The experimental evaluation of the bio-inspired prototype shows an improvement on the performance compared to a leg design based on purely engineering principles.

Download Full-text

Learning quadrupedal locomotion over challenging terrain

Science Robotics ◽

10.1126/scirobotics.abc5986 ◽

2020 ◽

Vol 5 (47) ◽

pp. eabc5986 ◽

Cited By ~ 2

Author(s):

Joonho Lee ◽

Jemin Hwangbo ◽

Lorenz Wellhausen ◽

Vladlen Koltun ◽

Marco Hutter

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Legged Locomotion ◽

Natural Environments ◽

State Machines ◽

Robust Controller ◽

Locomotion Control ◽

Animal Locomotion ◽

Quadrupedal Locomotion ◽

Motion Primitives

Legged locomotion can extend the operational domain of robots to some of the most challenging environments on Earth. However, conventional controllers for legged locomotion are based on elaborate state machines that explicitly trigger the execution of motion primitives and reflexes. These designs have increased in complexity but fallen short of the generality and robustness of animal locomotion. Here, we present a robust controller for blind quadrupedal locomotion in challenging natural environments. Our approach incorporates proprioceptive feedback in locomotion control and demonstrates zero-shot generalization from simulation to natural environments. The controller is trained by reinforcement learning in simulation. The controller is driven by a neural network policy that acts on a stream of proprioceptive signals. The controller retains its robustness under conditions that were never encountered during training: deformable terrains such as mud and snow, dynamic footholds such as rubble, and overground impediments such as thick vegetation and gushing water. The presented work indicates that robust locomotion in natural environments can be achieved by training in simple domains.

Download Full-text

Slope Handling for Quadruped Robots Using Deep Reinforcement Learning and Toe Trajectory Planning*

2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) ◽

10.1109/iros45743.2020.9341645 ◽

2020 ◽

Author(s):

Athanasios S. Mastrogeorgiou ◽

Yehia S. Elbahrawy ◽

Andres Kecskemethy ◽

Evangelos G. Papadopoulos

Keyword(s):

Reinforcement Learning ◽

Trajectory Planning ◽

Quadruped Robots

Download Full-text

Elman Backpropagation as Reinforcement for Simple Recurrent Networks

Neural Computation ◽

10.1162/neco.2007.19.11.3108 ◽

2007 ◽

Vol 19 (11) ◽

pp. 3108-3131 ◽

Cited By ~ 12

Author(s):

André Grüning

Keyword(s):

Reinforcement Learning ◽

Language Processing ◽

Time Series Prediction ◽

Natural Environments ◽

Recurrent Networks ◽

Major Drawback ◽

Bp Network ◽

Learning Tasks ◽

Network Simulations ◽

Simple Recurrent Networks

Simple recurrent networks (SRNs) in symbolic time-series prediction (e.g., language processing models) are frequently trained with gradient descent--based learning algorithms, notably with variants of backpropagation (BP). A major drawback for the cognitive plausibility of BP is that it is a supervised scheme in which a teacher has to provide a fully specified target answer. Yet agents in natural environments often receive summary feedback about the degree of success or failure only, a view adopted in reinforcement learning schemes. In this work, we show that for SRNs in prediction tasks for which there is a probability interpretation of the network's output vector, Elman BP can be reimplemented as a reinforcement learning scheme for which the expected weight updates agree with the ones from traditional Elman BP. Network simulations on formal languages corroborate this result and show that the learning behaviors of Elman backpropagation and its reinforcement variant are very similar also in online learning tasks.

Download Full-text

Differential reinforcement encoding along the hippocampal long axis helps resolve the explore/exploit dilemma

10.1101/2020.01.02.893255 ◽

2020 ◽

Author(s):

Alexandre Y. Dombrovski ◽

Beatriz Luna ◽

Michael N. Hallquist

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Differential Reinforcement ◽

Cognitive Maps ◽

Learning Task ◽

Natural Environments ◽

Reward Prediction Error ◽

Reward Function ◽

Reward Prediction ◽

Reward Information

ABSTRACTWhen making decisions, should one exploit known good options or explore potentially better alternatives? Exploration of spatially unstructured options depends on the neocortex, striatum, and amygdala. In natural environments, however, better options often cluster together, forming structured value distributions. The hippocampus binds reward information into allocentric cognitive maps to support navigation and foraging in such spaces. Using a reinforcement learning task with a spatially structured reward function, we show that human posterior hippocampus (PH) invigorates exploration while anterior hippocampus (AH) supports the transition to exploitation. These dynamics depend on differential reinforcement representations in the PH and AH. Whereas local reward prediction error signals are early and phasic in the PH tail, global value maximum signals are delayed and sustained in the AH body. AH compresses reinforcement information across episodes, updating the location and prominence of the value maximum and displaying goal cell-like ramping activity when navigating toward it.

Download Full-text

Effect of Iron Limitation on the Ultrastructure of Agmenellum Quadruplicatum

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100098551 ◽

1981 ◽

Vol 39 ◽

pp. 410-411

Author(s):

L. P. Hardie ◽

D. L. Balkwill ◽

S. E. Stevens

Keyword(s):

Growth Medium ◽

Green Alga ◽

Natural Habitat ◽

Iron Limitation ◽

Natural Environments ◽

Blue Green Alga ◽

Marine Cyanobacterium ◽

Natural Systems ◽

Thin Sectioning ◽

Agmenellum Quadruplicatum

Agmenellum quadruplicatum is a unicellular, non-nitrogen-fixing, marine cyanobacterium (blue-green alga). The ultrastructure of this organism, when grown in the laboratory with all necessary nutrients, has been characterized thoroughly. In contrast, little is known of its ultrastructure in the specific nutrient-limiting conditions typical of its natural habitat. Iron is one of the nutrients likely to limit this organism in such natural environments. It is also of great importance metabolically, being required for both photosynthesis and assimilation of nitrate. The purpose of this study was to assess the effects (if any) of iron limitation on the ultrastructure of A. quadruplicatum. It was part of a broader endeavor to elucidate the ultrastructure of cyanobacteria in natural systemsActively growing cells were placed in a growth medium containing 1% of its usual iron. The cultures were then sampled periodically for 10 days and prepared for thin sectioning TEM to assess the effects of iron limitation.

Download Full-text

Using Personal Cell Phones for Ecological Momentary Assessment

European Psychologist ◽

10.1027/1016-9040/a000127 ◽

2013 ◽

Vol 18 (1) ◽

pp. 3-11 ◽

Cited By ~ 42

Author(s):

Emmanuel Kuntsche ◽

Florian Labhart

Keyword(s):

Data Collection ◽

Ecological Momentary Assessment ◽

Communication Systems ◽

Cell Phones ◽

Personal Digital Assistants ◽

Natural Environments ◽

Mobility Patterns ◽

Standard Data ◽

Ecological Momentary ◽

Momentary Assessment

Ecological Momentary Assessment (EMA) is a way of collecting data in people’s natural environments in real time and has become very popular in social and health sciences. The emergence of personal digital assistants has led to more complex and sophisticated EMA protocols but has also highlighted some important drawbacks. Modern cell phones combine the functionalities of advanced communication systems with those of a handheld computer and offer various additional features to capture and record sound, pictures, locations, and movements. Moreover, most people own a cell phone, are familiar with the different functions, and always carry it with them. This paper describes ways in which cell phones have been used for data collection purposes in the field of social sciences. This includes automated data capture techniques, for example, geolocation for the study of mobility patterns and the use of external sensors for remote health-monitoring research. The paper also describes cell phones as efficient and user-friendly tools for prompt manual data collection, that is, by asking participants to produce or to provide data. This can either be done by means of dedicated applications or by simply using the web browser. We conclude that cell phones offer a variety of advantages and have a great deal of potential for innovative research designs, suggesting they will be among the standard data collection devices for EMA in the coming years.

Download Full-text