Reinforcement Distribution in Continuous State Action Space Fuzzy Q–Learning: A Novel Approach

Q-Learning in Continuous State-Action Space with Noisy and Redundant Inputs by Using a Selective Desensitization Neural Network

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0825 ◽

2015 ◽

Vol 19 (6) ◽

pp. 825-832 ◽

Cited By ~ 2

Author(s):

Takaaki Kobayashi ◽

◽

Takeshi Shibuya ◽

Masahiko Morita

Keyword(s):

Neural Network ◽

Real World ◽

Value Function ◽

Action Space ◽

Sensor Noise ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Real World Applications ◽

Function Approximator

When applying reinforcement learning (RL) algorithms such as Q-learning to real-world applications, we must consider the influence of sensor noise. The simplest way to reduce such noise influence is to additionally use other types of sensors, but this may require more state space -- and probably increase redundancy. Conventional value-function approximators used to RL in continuous state-action space do not deal appropriately with such situations. The selective desensitization neural network (SDNN) has high generalization ability and robustness against noise and redundant input. We therefore propose an SDNN-based value-function approximator for Q-learning in continuous state-action space, and evaluate its performance in terms of robustness against redundant input and sensor noise. Results show that our proposal is strongly robust against noise and redundant input and enables the agent to take better actions by using additional inputs without degrading learning efficiency. These properties are eminently advantageous in real-world applications such as in robotic systems.

Download Full-text

Q-learning in continuous state-action space with redundant dimensions by using a selective desensitization neural network

2014 Joint 7th International Conference on Soft Computing and Intelligent Systems (SCIS) and 15th International Symposium on Advanced Intelligent Systems (ISIS) ◽

10.1109/scis-isis.2014.7044714 ◽

2014 ◽

Cited By ~ 1

Author(s):

Takaaki Kobayashi ◽

Takeshi Shibuya ◽

Masahiko Morita

Keyword(s):

Neural Network ◽

Action Space ◽

Q Learning ◽

State Action ◽

Continuous State

Download Full-text

Online Tuning of a PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit

Electronics ◽

10.3390/electronics8020231 ◽

2019 ◽

Vol 8 (2) ◽

pp. 231 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Reinforcement Learning ◽

Flow Rate ◽

Pid Controller ◽

Hybrid Control ◽

Q Learning ◽

State Action ◽

Continuous State ◽

Multi Agent ◽

Flow Rate Control ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) fuzzy reinforcement learning MAS (Multi-Agent System) approach for online tuning of a Proportional Integral Derivative (PID) controller in order to control the flow rate of a desalination unit. The PID gains are set by the Z-N method and then are adapted online through the fuzzy Q-learning MAS. The fuzzy Q-learning is introduced in each agent in order to confront with the continuous state-action space. The global state of the MAS is defined by the value of the error and the derivative of error. The MAS consists of three agents and the output signal of each agent defines the percentage change of each gain. The increment or the reduction of each gain can be in the range of 0% to 100% of its initial value. The simulation results highlight the performance of the suggested hybrid control strategy through comparison with the conventional PID controller tuned by Z-N.

Download Full-text

Adaptive Object Tracking via Multi-Angle Analysis Collaboration

Sensors ◽

10.3390/s18113606 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3606 ◽

Cited By ~ 1

Author(s):

Wanli Xue ◽

Zhiyong Feng ◽

Chao Xu ◽

Zhaopeng Meng ◽

Chengwei Zhang

Keyword(s):

Object Tracking ◽

Learning Algorithm ◽

Action Space ◽

Selection Strategy ◽

Multiple Perspectives ◽

Strategic Framework ◽

Practical Applications ◽

Q Learning ◽

State Action ◽

Speed And Accuracy

Although tracking research has achieved excellent performance in mathematical angles, it is still meaningful to analyze tracking problems from multiple perspectives. This motivation not only promotes the independence of tracking research but also increases the flexibility of practical applications. This paper presents a significant tracking framework based on the multi-dimensional state–action space reinforcement learning, termed as multi-angle analysis collaboration tracking (MACT). MACT is comprised of a basic tracking framework and a strategic framework which assists the former. Especially, the strategic framework is extensible and currently includes feature selection strategy (FSS) and movement trend strategy (MTS). These strategies are abstracted from the multi-angle analysis of tracking problems (observer’s attention and object’s motion). The content of the analysis corresponds to the specific actions in the multidimensional action space. Concretely, the tracker, regarded as an agent, is trained with Q-learning algorithm and ϵ -greedy exploration strategy, where we adopt a customized rewarding function to encourage robust object tracking. Numerous contrast experimental evaluations on the OTB50 benchmark demonstrate the effectiveness of the strategies and improvement in speed and accuracy of MACT tracker.

Download Full-text

Swarm Reinforcement Learning Methods for Problems with Continuous State-action Space

Transactions of the Society of Instrument and Control Engineers ◽

10.9746/sicetr.48.790 ◽

2012 ◽

Vol 48 (11) ◽

pp. 790-798

Author(s):

Hitoshi IIMA ◽

Yasuaki KUROE

Keyword(s):

Reinforcement Learning ◽

Action Space ◽

Learning Methods ◽

State Action ◽

Continuous State

Download Full-text

Continuous State-Action Space Advantage-Learning Using Interval Analysis and Neural Networks

AIAA Guidance, Navigation and Control Conference and Exhibit ◽

10.2514/6.2007-6522 ◽

2007 ◽

Cited By ~ 2

Author(s):

E. Weerdt ◽

Q.P. Chu ◽

J.A. Mulder

Keyword(s):

Neural Networks ◽

Interval Analysis ◽

Action Space ◽

State Action ◽

Continuous State

Download Full-text

Fuzzy Q-learning in continuous state and action space

The Journal of China Universities of Posts and Telecommunications ◽

10.1016/s1005-8885(09)60495-7 ◽

2010 ◽

Vol 17 (4) ◽

pp. 100-109 ◽

Cited By ~ 2

Author(s):

Ming-liang XU ◽

Wen-bo XU

Keyword(s):

Action Space ◽

Q Learning ◽

Continuous State

Download Full-text

Safe Exploration of State and Action Spaces in Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.3761 ◽

2012 ◽

Vol 45 ◽

pp. 515-564 ◽

Cited By ~ 20

Author(s):

J. Garcia ◽

F. Fernandez

Keyword(s):

Reinforcement Learning ◽

Learning System ◽

Action Space ◽

High Dimensional ◽

State Action ◽

Continuous State ◽

Additional Challenge ◽

Efficient Exploration ◽

Action Spaces ◽

Selection Of

In this paper, we consider the important problem of safe exploration in reinforcement learning. While reinforcement learning is well-suited to domains with complex transition dynamics and high-dimensional state-action spaces, an additional challenge is posed by the need for safe and efficient exploration. Traditional exploration techniques are not particularly useful for solving dangerous tasks, where the trial and error process may lead to the selection of actions whose execution in some states may result in damage to the learning system (or any other system). Consequently, when an agent begins an interaction with a dangerous and high-dimensional state-action space, an important question arises; namely, that of how to avoid (or at least minimize) damage caused by the exploration of the state-action space. We introduce the PI-SRL algorithm which safely improves suboptimal albeit robust behaviors for continuous state and action control tasks and which efficiently learns from the experience gained from the environment. We evaluate the proposed method in four complex tasks: automatic car parking, pole-balancing, helicopter hovering, and business management.

Download Full-text

The New Geometric “State-Action” Space Representation for Q-Learning Algorithm for Protein Structure Folding Problem

Cybernetics and Computer Technologies ◽

10.34229/2707-451x.20.3.6 ◽

2020 ◽

pp. 59-73

Author(s):

S. Chornozhuk

Keyword(s):

Protein Structure ◽

State Space ◽

Learning Algorithm ◽

Action Space ◽

Space Representation ◽

Q Learning ◽

State Action ◽

State Space Representation ◽

Advantages And Disadvantages ◽

Learning Techniques

Introduction. The spatial protein structure folding is an important and actual problem in computational biology. Considering the mathematical model of the task, it can be easily concluded that finding an optimal protein conformation in a three dimensional grid is a NP-hard problem. Therefore some reinforcement learning techniques such as Q-learning approach can be used to solve the problem. The article proposes a new geometric “state-action” space representation which significantly differs from all alternative representations used for this problem. The purpose of the article is to analyze existing approaches of different states and actions spaces representations for Q-learning algorithm for protein structure folding problem, reveal their advantages and disadvantages and propose the new geometric “state-space” representation. Afterwards the goal is to compare existing and the proposed approaches, make conclusions with also describing possible future steps of further research. Result. The work of the proposed algorithm is compared with others on the basis of 10 known chains with a length of 48 first proposed in [16]. For each of the chains the Q-learning algorithm with the proposed “state-space” representation outperformed the same Q-learning algorithm with alternative existing “state-space” representations both in terms of average and minimal energy values of resulted conformations. Moreover, a plenty of existing representations are used for a 2D protein structure predictions. However, during the experiments both existing and proposed representations were slightly changed or developed to solve the problem in 3D, which is more computationally demanding task. Conclusion. The quality of the Q-learning algorithm with the proposed geometric “state-action” space representation has been experimentally confirmed. Consequently, it’s proved that the further research is promising. Moreover, several steps of possible future research such as combining the proposed approach with deep learning techniques has been already suggested. Keywords: Spatial protein structure, combinatorial optimization, relative coding, machine learning, Q-learning, Bellman equation, state space, action space, basis in 3D space.

Download Full-text

Fuzzy Q-Learning Agent for Online Tuning of PID Controller for DC Motor Speed Control

Algorithms ◽

10.3390/a11100148 ◽

2018 ◽

Vol 11 (10) ◽

pp. 148 ◽

Cited By ~ 2

Author(s):

Panagiotis Kofinas ◽

Anastasios I. Dounis

Keyword(s):

Pid Controller ◽

Dc Motor ◽

Proportional Integral Derivative ◽

Motor Speed ◽

Initial Value ◽

Q Learning ◽

State Action ◽

Learning Agent ◽

Continuous State ◽

Online Tuning

This paper proposes a hybrid Zeigler-Nichols (Z-N) reinforcement learning approach for online tuning of the parameters of the Proportional Integral Derivative (PID) for controlling the speed of a DC motor. The PID gains are set by the Z-N method, and are then adapted online through the fuzzy Q-Learning agent. The fuzzy Q-Learning agent is used instead of the conventional Q-Learning, in order to deal with the continuous state-action space. The fuzzy Q-Learning agent defines its state according to the value of the error. The output signal of the agent consists of three output variables, in which each one defines the percentage change of each gain. Each gain can be increased or decreased from 0% to 50% of its initial value. Through this method, the gains of the controller are adjusted online via the interaction of the environment. The knowledge of the expert is not a necessity during the setup process. The simulation results highlight the performance of the proposed control strategy. After the exploration phase, the settling time is reduced in the steady states. In the transient states, the response has less amplitude oscillations and reaches the equilibrium point faster than the conventional PID controller.

Download Full-text