Multiagent reinforcement learning with the partly high-dimensional state space

2006 ◽  
Vol 37 (9) ◽  
pp. 22-31 ◽  
Author(s):  
Kazuyuki Fujita ◽  
Hiroshi Matsuo
2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Wei Dai ◽  
Wei Wang ◽  
Zhongtian Mao ◽  
Ruwen Jiang ◽  
Fudong Nian ◽  
...  

The main objective of multiagent reinforcement learning is to achieve a global optimal policy. It is difficult to evaluate the value function with high-dimensional state space. Therefore, we transfer the problem of multiagent reinforcement learning into a distributed optimization problem with constraint terms. In this problem, all agents share the space of states and actions, but each agent only obtains its own local reward. Then, we propose a distributed optimization with fractional order dynamics to solve this problem. Moreover, we prove the convergence of the proposed algorithm and illustrate its effectiveness with a numerical example.


2010 ◽  
Vol 30 (2) ◽  
pp. 192-215 ◽  
Author(s):  
Alexander Shkolnik ◽  
Michael Levashov ◽  
Ian R. Manchester ◽  
Russ Tedrake

A motion planning algorithm is described for bounding over rough terrain with the LittleDog robot. Unlike walking gaits, bounding is highly dynamic and cannot be planned with quasi-steady approximations. LittleDog is modeled as a planar five-link system, with a 16-dimensional state space; computing a plan over rough terrain in this high-dimensional state space that respects the kinodynamic constraints due to underactuation and motor limits is extremely challenging. Rapidly Exploring Random Trees (RRTs) are known for fast kinematic path planning in high-dimensional configuration spaces in the presence of obstacles, but search efficiency degrades rapidly with the addition of challenging dynamics. A computationally tractable planner for bounding was developed by modifying the RRT algorithm by using: (1) motion primitives to reduce the dimensionality of the problem; (2) Reachability Guidance, which dynamically changes the sampling distribution and distance metric to address differential constraints and discontinuous motion primitive dynamics; and (3) sampling with a Voronoi bias in a lower-dimensional “task space” for bounding. Short trajectories were demonstrated to work on the robot, however open-loop bounding is inherently unstable. A feedback controller based on transverse linearization was implemented, and shown in simulation to stabilize perturbations in the presence of noise and time delays.


2021 ◽  
Vol 31 (5) ◽  
Author(s):  
Jacob Vorstrup Goldman ◽  
Sumeetpal S. Singh

AbstractWe propose a novel blocked version of the continuous-time bouncy particle sampler of Bouchard-Côté et al. (J Am Stat Assoc 113(522):855–867, 2018) which is applicable to any differentiable probability density. This alternative implementation is motivated by blocked Gibbs sampling for state-space models (Singh et al. in Biometrika 104(4):953–969, 2017) and leads to significant improvement in terms of effective sample size per second, and furthermore, allows for significant parallelization of the resulting algorithm. The new algorithms are particularly efficient for latent state inference in high-dimensional state-space models, where blocking in both space and time is necessary to avoid degeneracy of MCMC. The efficiency of our blocked bouncy particle sampler, in comparison with both the standard implementation of the bouncy particle sampler and the particle Gibbs algorithm of Andrieu et al. (J R Stat Soc Ser B Stat Methodol 72(3):269–342, 2010), is illustrated numerically for both simulated data and a challenging real-world financial dataset.


2011 ◽  
Vol 11 (3&4) ◽  
pp. 313-325
Author(s):  
Warner A. Miller

An increase in the dimension of state space for quantum key distribution (QKD) can decrease its fidelity requirements while also increasing its bandwidth. A significant obstacle for QKD with qu$d$its ($d\geq 3$) has been an efficient and practical quantum state sorter for photons whose complex fields are modulated in both amplitude and phase. We propose such a sorter based on a multiplexed thick hologram, constructed e.g. from photo-thermal refractive (PTR) glass. We validate this approach using coupled-mode theory with parameters consistent with PTR glass to simulate a holographic sorter. The model assumes a three-dimensional state space spanned by three tilted planewaves. The utility of such a sorter for broader quantum information processing applications can be substantial.


2016 ◽  
Vol 11 (3) ◽  
pp. 350-374 ◽  
Author(s):  
Chris Westbury

There is a distinction in scientific explanation between the explanandum, statements describing the empirical phenomenon to be explained, and the explanans, statements describing the evidence that allow one to predict that phenomenon. To avoid tautology, these sets of statements must refer to distinct domains. A scientific explanation of semantics must be grounded in explanans that appeal to entities from non-semantic domains. I consider as examples eight candidate domains (including affect, lexical or sub-word co-occurrence, mental simulation, and associative learning) that could ground semantics. Following Wittgenstein (1954), I propose adjudicating between these different domains is difficult because of the reification of a word’s ‘meaning’ as an atomistic unit. If we abandon the idea of the meaning of a word as being an atomistic unit and instead think of word meaning as a set of dynamic and disparate embodied states unified by a shared label, many apparent problems associated with identifying a meaning’s ‘true’ explanans disappear. Semantics can be considered as sets of weighted constraints that are individually sufficient for specifying and labeling a subjectively-recognizable location in the high dimensional state space defined by our neural activity.


Sign in / Sign up

Export Citation Format

Share Document