A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Lieping Zhang; Liu Tang; Shenglan Zhang; Zhengzhong Wang; Xianhao Shen; Zuqiong Zhang

doi:10.3390/sym13061057

A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm

Symmetry ◽

10.3390/sym13061057 ◽

2021 ◽

Vol 13 (6) ◽

pp. 1057

Author(s):

Lieping Zhang ◽

Liu Tang ◽

Shenglan Zhang ◽

Zhengzhong Wang ◽

Xianhao Shen ◽

...

Keyword(s):

Success Rate ◽

Learning Algorithm ◽

Experimental Results ◽

Simulation Experiments ◽

Greedy Strategy ◽

Q Learning ◽

Grid Map ◽

Current State ◽

Two Phases ◽

Self Adaptive

Directing at various problems of the traditional Q-Learning algorithm, such as heavy repetition and disequilibrium of explorations, the reinforcement-exploration strategy was used to replace the decayed ε-greedy strategy in the traditional Q-Learning algorithm, and thus a novel self-adaptive reinforcement-exploration Q-Learning (SARE-Q) algorithm was proposed. First, the concept of behavior utility trace was introduced in the proposed algorithm, and the probability for each action to be chosen was adjusted according to the behavior utility trace, so as to improve the efficiency of exploration. Second, the attenuation process of exploration factor ε was designed into two phases, where the first phase centered on the exploration and the second one transited the focus from the exploration into utilization, and the exploration rate was dynamically adjusted according to the success rate. Finally, by establishing a list of state access times, the exploration factor of the current state is adaptively adjusted according to the number of times the state is accessed. The symmetric grid map environment was established via OpenAI Gym platform to carry out the symmetrical simulation experiments on the Q-Learning algorithm, self-adaptive Q-Learning (SA-Q) algorithm and SARE-Q algorithm. The experimental results show that the proposed algorithm has obvious advantages over the first two algorithms in the average number of turning times, average inside success rate, and number of times with the shortest planned route.

Download Full-text

Path planning based on Q-learning and three-segment method for aircraft fuel tank inspection robot

Filomat ◽

10.2298/fil1805797g ◽

2018 ◽

Vol 32 (5) ◽

pp. 1797-1807 ◽

Cited By ~ 1

Author(s):

Niu Guochen ◽

Xu Kailu

Keyword(s):

Path Planning ◽

Learning Algorithm ◽

Initial Point ◽

Fuel Tank ◽

Simulation Experiments ◽

Q Learning ◽

Inspection Robot ◽

Computing Complexity ◽

Aircraft Fuel ◽

Continuum Robot

In order to realize the path planning of continuum robot for inspecting defects in the aircraft fuel tank compartment, an approach based on Q-learning and Three-segment Method was proposed, and the posture of the robot meeting the inherent and spatial structure constraint requirements was planned. Firstly, the simulation model of the aircraft fuel tank was established. Moreover, the workspace was rasterized to decrease the computing complexity. Secondly, the Q-learning algorithm was applied and the path from the initial point to the target was generated. In terms of target guided angle and three-segment method, the joint variables corresponding to each transition point on the path could be obtained. Finally, the robot reached the target by progressively updating the joint variables. Simulation experiments were implemented, and the results verified the effectiveness and feasibility of the algorithm.

Download Full-text

NAO robot obstacle avoidance based on fuzzy Q-learning

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-01-2019-0002 ◽

2019 ◽

Vol 47 (6) ◽

pp. 801-811 ◽

Cited By ~ 1

Author(s):

Shuhuan Wen ◽

Xueheng Hu ◽

Zhen Li ◽

Hak Keung Lam ◽

Fuchun Sun ◽

...

Keyword(s):

Obstacle Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Experimental Results ◽

Content Type ◽

Q Learning ◽

Nao Robot ◽

Learning Speed ◽

Simulation Results ◽

Fractional Controller

Purpose This paper aims to propose a novel active SLAM framework to realize avoid obstacles and finish the autonomous navigation in indoor environment. Design/methodology/approach The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. The localization of the robot is based on FastSLAM algorithm. Findings Simulation results of avoiding obstacles using traditional Q-learning algorithm, optimized Q-learning algorithm and FOQL algorithm are compared. The simulation results show that the improved FOQL algorithm has a faster learning speed than other two algorithms. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective. Originality/value The improved fuzzy optimized Q-Learning (FOQL) algorithm is used to solve the avoidance obstacles problem of the robot in the environment. To reduce the motion deviation of the robot, fractional controller is designed. To verify the simulation result, the FOQL algorithm is implemented on a NAO robot and the experimental results demonstrate that the improved fuzzy optimized Q-Learning obstacle avoidance algorithm is feasible and effective.

Download Full-text

Two-level Q-learning: learning from conflict demonstrations

The Knowledge Engineering Review ◽

10.1017/s0269888919000092 ◽

2019 ◽

Vol 34 ◽

Author(s):

Mao Li ◽

Yi Wei ◽

Daniel Kudenko

Keyword(s):

Learning Algorithm ◽

State Of The Art ◽

The State ◽

Q Learning ◽

Current State ◽

Speed Up ◽

Traditional Assumption ◽

Optimal Action ◽

Multiple Experts ◽

Novel Algorithm

Abstract One way to address this low sample efficiency of reinforcement learning (RL) is to employ human expert demonstrations to speed up the RL process (RL from demonstration or RLfD). The research so far has focused on demonstrations from a single expert. However, little attention has been given to the case where demonstrations are collected from multiple experts, whose expertise may vary on different aspects of the task. In such scenarios, it is likely that the demonstrations will contain conflicting advice in many parts of the state space. We propose a two-level Q-learning algorithm, in which the RL agent not only learns the policy of deciding on the optimal action but also learns to select the most trustworthy expert according to the current state. Thus, our approach removes the traditional assumption that demonstrations come from one single source and are mostly conflict-free. We evaluate our technique on three different domains and the results show that the state-of-the-art RLfD baseline fails to converge or performs similarly to conventional Q-learning. In contrast, the performance level of our novel algorithm increases with more experts being involved in the learning process and the proposed approach has the capability to handle demonstration conflicts well.

Download Full-text

Prioritized epoch-incremental Q-learning algorithm

Theoretical and Applied Informatics ◽

10.2478/v10179-012-0008-1 ◽

2012 ◽

Vol 24 (2) ◽

Cited By ~ 1

Author(s):

Roman Zajdel

Keyword(s):

Learning Algorithm ◽

Q Learning

Download Full-text

Application of improved Q learning algorithm to job shop problem

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.03268 ◽

2009 ◽

Vol 28 (12) ◽

pp. 3268-3270

Author(s):

Chao WANG ◽

Jing GUO ◽

Zhen-qiang BAO

Keyword(s):

Job Shop ◽

Learning Algorithm ◽

Q Learning

Download Full-text

A Novel Q-Learning Algorithm Based on the Stochastic Environment Path Planning Problem

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) ◽

10.1109/trustcom50675.2020.00270 ◽

2020 ◽

Author(s):

Jian Li ◽

Fei Rong ◽

Yu Tang

Keyword(s):

Path Planning ◽

Learning Algorithm ◽

Planning Problem ◽

Stochastic Environment ◽

Q Learning ◽

Path Planning Problem

Download Full-text

Charging Guiding Strategy for PET Based on Q Learning Algorithm (iSPEC 2020)

2020 IEEE Sustainable Power and Energy Conference (iSPEC) ◽

10.1109/ispec50848.2020.9351291 ◽

2020 ◽

Author(s):

Yang You ◽

Zhaoxia Jing ◽

Yichuan Huang

Keyword(s):

Learning Algorithm ◽

Q Learning

Download Full-text

Q-Learning Algorithm Based Topology Control of Power Line Communication Networks

2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess49938.2020.9237707 ◽

2020 ◽

Author(s):

Wenbin Chen ◽

Libin Zheng

Keyword(s):

Communication Networks ◽

Topology Control ◽

Learning Algorithm ◽

Power Line ◽

Power Line Communication ◽

Q Learning

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

Research on Optimal Strategy of Peak-shaving of Photovoltaic Grid-connected System Based on Simulated Annealing-Q Learning Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1871/1/012112 ◽

2021 ◽

Vol 1871 (1) ◽

pp. 012112

Author(s):

Xu Jun ◽

Chen Jinhui ◽

Zhang Zhe

Keyword(s):

Simulated Annealing ◽

Optimal Strategy ◽

Learning Algorithm ◽

Peak Shaving ◽

Q Learning

Download Full-text