scholarly journals Forced ε-Greedy, an Expansion to the ε-Greedy Action Selection Method

2021 ◽  
Author(s):  
George Angelopoulos ◽  
Dimitris Metafas

Reinforcement Learning methods such as Q Learning, make use of action selection methods, in order to train an agent to perform a task. As the complexity of the task grows, so does the time required to train the agent. In this paper Q Learning is applied onto the board game Dominion, and Forced ε-greedy, an expansion to the ε-greedy action selection method is introduced. As shown in this paper the Forced ε-greedy method achieves to accelerate the training process and optimize its results, especially as the complexity of the task grows.

2017 ◽  
Vol 6 (2) ◽  
pp. 57 ◽  
Author(s):  
Hirofumi Miyajima ◽  
Noritaka Shigei ◽  
Syunki Makino ◽  
Hiromi Miyajima ◽  
Yohtaro Miyanishi ◽  
...  

Many studies have been done with the security of cloud computing. Though data encryption is a typical approach, high computing complexity for encryption and decryption of data is needed. Therefore, safe system for distributed processing with secure data attracts attention, and a lot of studies have been done. Secure multiparty computation (SMC) is one of these methods. Specifically, two learning methods for machine learning (ML) with SMC are known. One is to divide learning data into several subsets and perform learning. The other is to divide each item of learning data and perform learning. So far, most of works for ML with SMC are ones with supervised and unsupervised learning such as BP and K-means methods. It seems that there does not exist any studies for reinforcement learning (RL) with SMC. This paper proposes learning methods with SMC for Q-learning which is one of typical methods for RL. The effectiveness of proposed methods is shown by numerical simulation for the maze problem.


2006 ◽  
Vol 04 (06) ◽  
pp. 1071-1083 ◽  
Author(s):  
C. L. CHEN ◽  
D. Y. DONG ◽  
Z. H. CHEN

This paper proposes a novel action selection method based on quantum computation and reinforcement learning (RL). Inspired by the advantages of quantum computation, the state/action in a RL system is represented with quantum superposition state. The probability of action eigenvalue is denoted by probability amplitude, which is updated according to rewards. And the action selection is carried out by observing quantum state according to collapse postulate of quantum measurement. The results of simulated experiments show that quantum computation can be effectively used to action selection and decision making through speeding up learning. This method also makes a good tradeoff between exploration and exploitation for RL using probability characteristics of quantum theory.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Feng Ding ◽  
Guanfeng Ma ◽  
Zhikui Chen ◽  
Jing Gao ◽  
Peng Li

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.


2021 ◽  
Author(s):  
Annapurna P Patil ◽  
SANJAY RAGHAVENDRA ◽  
Shruthi Srinarasi ◽  
Reshma Ram

<p>Reinforcement Learning (RL) is the study of how Artificial Intelligence (AI) agents learn to make their own decisions in an environment to maximize the cumulative reward received. Although there has been notable progress in the application of RL for games, the category of ancient Indian games has remained almost untouched. Chowka Bhara is one such ancient Indian board game. This work aims at developing a Q-Learning-based RL Chowka Bhara player whose strategies and methodologies are obtained from three Strategic Players viz. Fast Player, Random Player, and Balanced Player. It is observed through the experimental results that the Q-Learning Player outperforms all three Strategic Players.</p>


Author(s):  
Fumito Uwano ◽  
◽  
Keiki Takadama

This study discusses important factors for zero communication, multi-agent cooperation by comparing different modified reinforcement learning methods. The two learning methods used for comparison were assigned different goal selections for multi-agent cooperation tasks. The first method is called Profit Minimizing Reinforcement Learning (PMRL); it forces agents to learn how to reach the farthest goal, and then the agent closest to the goal is directed to the goal. The second method is called Yielding Action Reinforcement Learning (YARL); it forces agents to learn through a Q-learning process, and if the agents have a conflict, the agent that is closest to the goal learns to reach the next closest goal. To compare the two methods, we designed experiments by adjusting the following maze factors: (1) the location of the start point and goal; (2) the number of agents; and (3) the size of maze. The intensive simulations performed on the maze problem for the agent cooperation task revealed that the two methods successfully enabled the agents to exhibit cooperative behavior, even if the size of the maze and the number of agents change. The PMRL mechanism always enables the agents to learn cooperative behavior, whereas the YARL mechanism makes the agents learn cooperative behavior over a small number of learning iterations. In zero communication, multi-agent cooperation, it is important that only agents that have a conflict cooperate with each other.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Bo Xin ◽  
Haixu Yu ◽  
You Qin ◽  
Qing Tang ◽  
Zhangqing Zhu

The training process analysis and termination condition of the training process of a Reinforcement Learning (RL) system have always been the key issues to train an RL agent. In this paper, a new approach based on State Entropy and Exploration Entropy is proposed to analyse the training process. The concept of State Entropy is used to denote the uncertainty for an RL agent to select the action at every state that the agent will traverse, while the Exploration Entropy denotes the action selection uncertainty of the whole system. Actually, the action selection uncertainty of a certain state or the whole system reflects the degree of exploration and the stage of the learning process for an agent. The Exploration Entropy is a new criterion to analyse and manage the training process of RL. The theoretical analysis and experiment results illustrate that the curve of Exploration Entropy contains more information than the existing analytical methods.


Sign in / Sign up

Export Citation Format

Share Document