Forced ε-Greedy, an Expansion to the ε-Greedy Action Selection Method

Mapping Intimacies ◽

10.3233/faia210070 ◽

2021 ◽

Author(s):

George Angelopoulos ◽

Dimitris Metafas

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Method ◽

Board Game ◽

Selection Methods ◽

Training Process ◽

Greedy Method ◽

Learning Methods ◽

Q Learning ◽

Time Required

Reinforcement Learning methods such as Q Learning, make use of action selection methods, in order to train an agent to perform a task. As the complexity of the task grows, so does the time required to train the agent. In this paper Q Learning is applied onto the board game Dominion, and Forced ε-greedy, an expansion to the ε-greedy action selection method is introduced. As shown in this paper the Forced ε-greedy method achieves to accelerate the training process and optimize its results, especially as the complexity of the task grows.

Download Full-text

A proposal of privacy preserving reinforcement learning for secure multiparty computation

Artificial Intelligence Research ◽

10.5430/air.v6n2p57 ◽

2017 ◽

Vol 6 (2) ◽

pp. 57 ◽

Cited By ~ 2

Author(s):

Hirofumi Miyajima ◽

Noritaka Shigei ◽

Syunki Makino ◽

Hiromi Miyajima ◽

Yohtaro Miyanishi ◽

...

Keyword(s):

Reinforcement Learning ◽

Distributed Processing ◽

Data Encryption ◽

Secure Multiparty Computation ◽

Multiparty Computation ◽

Learning Methods ◽

Q Learning ◽

Encryption And Decryption ◽

Maze Problem ◽

Learning Data

Many studies have been done with the security of cloud computing. Though data encryption is a typical approach, high computing complexity for encryption and decryption of data is needed. Therefore, safe system for distributed processing with secure data attracts attention, and a lot of studies have been done. Secure multiparty computation (SMC) is one of these methods. Specifically, two learning methods for machine learning (ML) with SMC are known. One is to divide learning data into several subsets and perform learning. The other is to divide each item of learning data and perform learning. So far, most of works for ML with SMC are ones with supervised and unsupervised learning such as BP and K-means methods. It seems that there does not exist any studies for reinforcement learning (RL) with SMC. This paper proposes learning methods with SMC for Q-learning which is one of typical methods for RL. The effectiveness of proposed methods is shown by numerical simulation for the maze problem.

Download Full-text

QUANTUM COMPUTATION FOR ACTION SELECTION USING REINFORCEMENT LEARNING

International Journal of Quantum Information ◽

10.1142/s0219749906002419 ◽

2006 ◽

Vol 04 (06) ◽

pp. 1071-1083 ◽

Cited By ~ 13

Author(s):

C. L. CHEN ◽

D. Y. DONG ◽

Z. H. CHEN

Keyword(s):

Decision Making ◽

Quantum Theory ◽

Reinforcement Learning ◽

Quantum Computation ◽

Action Selection ◽

Selection Method ◽

Superposition State ◽

Exploration And Exploitation ◽

Quantum Superposition ◽

State Action

This paper proposes a novel action selection method based on quantum computation and reinforcement learning (RL). Inspired by the advantages of quantum computation, the state/action in a RL system is represented with quantum superposition state. The probability of action eigenvalue is denoted by probability amplitude, which is updated according to rewards. And the action selection is carried out by observing quantum state according to collapse postulate of quantum measurement. The results of simulated experiments show that quantum computation can be effectively used to action selection and decision making through speeding up learning. This method also makes a good tradeoff between exploration and exploitation for RL using probability characteristics of quantum theory.

Download Full-text

An Action Selection Method Using Degree of Cooperation in a Multi-agent Reinforcement Learning System

Journal of Robotics Networking and Artificial Life ◽

10.2991/jrnal.2014.1.3.13 ◽

2014 ◽

Vol 1 (3) ◽

pp. 231 ◽

Cited By ~ 1

Author(s):

Masanori Kawamura ◽

Kunikazu Kobayashi

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Method ◽

Learning System ◽

Multi Agent

Download Full-text

Action Selection methods using Reinforcement Learning

From Animals to Animats 4 ◽

10.7551/mitpress/3118.003.0018 ◽

1996 ◽

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Methods

Download Full-text

Action-Selection Method for Reinforcement Learning Based on Cuckoo Search Algorithm

Arabian Journal for Science and Engineering ◽

10.1007/s13369-017-2873-8 ◽

2017 ◽

Vol 43 (12) ◽

pp. 6771-6785 ◽

Cited By ~ 3

Author(s):

Bilal H. Abed-alguni

Keyword(s):

Reinforcement Learning ◽

Search Algorithm ◽

Cuckoo Search ◽

Action Selection ◽

Cuckoo Search Algorithm ◽

Selection Method

Download Full-text

Action Selection Methods in a Robotic Reinforcement Learning Scenario

2018 IEEE Latin American Conference on Computational Intelligence (LA-CCI) ◽

10.1109/la-cci.2018.8625243 ◽

2018 ◽

Author(s):

Francisco Cruz ◽

Peter Wuppen ◽

Alvin Fazrie ◽

Cornelius Weber ◽

Stefan Wermter

Keyword(s):

Reinforcement Learning ◽

Action Selection ◽

Selection Methods

Download Full-text

Averaged Soft Actor-Critic for Deep Reinforcement Learning

Complexity ◽

10.1155/2021/6658724 ◽

2021 ◽

Vol 2021 ◽

pp. 1-16

Author(s):

Feng Ding ◽

Guanfeng Ma ◽

Zhikui Chen ◽

Jing Gao ◽

Peng Li

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Large Scale ◽

Experimental Results ◽

High Dimensional ◽

Training Process ◽

Value Network ◽

Q Learning ◽

Important Impact ◽

The Stability

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

Download Full-text

A Deep Reinforcement Learning Approach to The Ancient Indian Game - Chowka Bhara

10.36227/techrxiv.16780414 ◽

2021 ◽

Author(s):

Annapurna P Patil ◽

SANJAY RAGHAVENDRA ◽

Shruthi Srinarasi ◽

Reshma Ram

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Experimental Results ◽

Learning Approach ◽

Board Game ◽

Q Learning

<p>Reinforcement Learning (RL) is the study of how Artificial Intelligence (AI) agents learn to make their own decisions in an environment to maximize the cumulative reward received. Although there has been notable progress in the application of RL for games, the category of ancient Indian games has remained almost untouched. Chowka Bhara is one such ancient Indian board game. This work aims at developing a Q-Learning-based RL Chowka Bhara player whose strategies and methodologies are obtained from three Strategic Players viz. Fast Player, Random Player, and Balanced Player. It is observed through the experimental results that the Q-Learning Player outperforms all three Strategic Players.</p>

Download Full-text

Comparison Between Reinforcement Learning Methods with Different Goal Selections in Multi-Agent Cooperation

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0917 ◽

2017 ◽

Vol 21 (5) ◽

pp. 917-929 ◽

Cited By ~ 2

Author(s):

Fumito Uwano ◽

◽

Keiki Takadama

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Cooperative Behavior ◽

Learning Methods ◽

Q Learning ◽

Designed Experiments ◽

Multi Agent ◽

Agent Cooperation ◽

Maze Problem

This study discusses important factors for zero communication, multi-agent cooperation by comparing different modified reinforcement learning methods. The two learning methods used for comparison were assigned different goal selections for multi-agent cooperation tasks. The first method is called Profit Minimizing Reinforcement Learning (PMRL); it forces agents to learn how to reach the farthest goal, and then the agent closest to the goal is directed to the goal. The second method is called Yielding Action Reinforcement Learning (YARL); it forces agents to learn through a Q-learning process, and if the agents have a conflict, the agent that is closest to the goal learns to reach the next closest goal. To compare the two methods, we designed experiments by adjusting the following maze factors: (1) the location of the start point and goal; (2) the number of agents; and (3) the size of maze. The intensive simulations performed on the maze problem for the agent cooperation task revealed that the two methods successfully enabled the agents to exhibit cooperative behavior, even if the size of the maze and the number of agents change. The PMRL mechanism always enables the agents to learn cooperative behavior, whereas the YARL mechanism makes the agents learn cooperative behavior over a small number of learning iterations. In zero communication, multi-agent cooperation, it is important that only agents that have a conflict cooperate with each other.

Download Full-text

Exploration Entropy for Reinforcement Learning

Mathematical Problems in Engineering ◽

10.1155/2020/2672537 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Bo Xin ◽

Haixu Yu ◽

You Qin ◽

Qing Tang ◽

Zhangqing Zhu

Keyword(s):

Reinforcement Learning ◽

Theoretical Analysis ◽

Learning Process ◽

Process Analysis ◽

Action Selection ◽

Training Process ◽

New Approach ◽

State Entropy ◽

Key Issues ◽

New Criterion

The training process analysis and termination condition of the training process of a Reinforcement Learning (RL) system have always been the key issues to train an RL agent. In this paper, a new approach based on State Entropy and Exploration Entropy is proposed to analyse the training process. The concept of State Entropy is used to denote the uncertainty for an RL agent to select the action at every state that the agent will traverse, while the Exploration Entropy denotes the action selection uncertainty of the whole system. Actually, the action selection uncertainty of a certain state or the whole system reflects the degree of exploration and the stage of the learning process for an agent. The Exploration Entropy is a new criterion to analyse and manage the training process of RL. The theoretical analysis and experiment results illustrate that the curve of Exploration Entropy contains more information than the existing analytical methods.

Download Full-text