scholarly journals On Monte Carlo Tree Search and Reinforcement Learning

2017 ◽  
Vol 60 ◽  
pp. 881-936 ◽  
Author(s):  
Tom Vodopivec ◽  
Spyridon Samothrakis ◽  
Branko Ster

Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread adoption within the games community. Its links to traditional reinforcement learning (RL) methods have been outlined in the past; however, the use of RL techniques within tree search has not been thoroughly studied yet. In this paper we re-examine in depth this close relation between the two fields; our goal is to improve the cross-awareness between the two communities. We show that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning methods inspired by RL in conjunction with online search demonstrate encouraging results on several classic board games and in arcade video game competitions, where our algorithm recently ranked first. Our study promotes a unified view of learning, planning, and search.

2020 ◽  
Vol 11 (40) ◽  
pp. 10959-10972
Author(s):  
Xiaoxue Wang ◽  
Yujie Qian ◽  
Hanyu Gao ◽  
Connor W. Coley ◽  
Yiming Mo ◽  
...  

A new MCTS variant with a reinforcement learning value network and solvent prediction model proposes shorter synthesis routes with greener solvents.


Author(s):  
Hendrik Baier ◽  
Michael Kaisers

This paper addresses the challenge of online generalization in tree search. We propose Multiple Estimator Monte Carlo Tree Search (ME-MCTS), with a two-fold contribution: first, we introduce a formalization of online generalization that can represent existing techniques such as "history heuristics", "RAVE", or "OMA" -- contextual action value estimators or abstractors that generalize across specific contexts. Second, we incorporate recent advances in estimator averaging that enable guiding search by combining the online action value estimates of any number of such abstractors or similar types of action value estimators. Unlike previous work, which usually proposed a single abstractor for either the selection or the rollout phase of MCTS simulations, our approach focuses on the combination of multiple estimators and applies them to all move choices in MCTS simulations. As the MCTS tree itself is just another value estimator -- unbiased, but without abstraction -- this blurs the traditional distinction between action choices inside and outside of the MCTS tree. Experiments with three abstractors in four board games show significant improvements of ME-MCTS over MCTS using only a single abstractor, both for MCTS with random rollouts as well as for MCTS with static evaluation functions. While we used deterministic, fully observable games, ME-MCTS naturally extends to more challenging settings.


2013 ◽  
Vol 48 ◽  
pp. 841-883 ◽  
Author(s):  
A. Guez ◽  
D. Silver ◽  
P. Dayan

Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an infinite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.


Sign in / Sign up

Export Citation Format

Share Document