The need for improved reinforcement learning techniques in intelligent agents

Reinforcement learning is a major tool to realize intelligent agents that can be autonomously adaptive to the environment. With deep models, reinforcement learning has shown great potential in complex tasks such as playing games from pixels. However, current reinforcement learning techniques are still suffer from requiring a huge amount of interaction data, which could result in unbearable cost in real-world applications. In this article, we share our understanding of the problem, and discuss possible ways to alleviate the sample cost of reinforcement learning, from the aspects of exploration, optimization, environment modeling, experience transfer, and abstraction. We also discuss some challenges in real-world applications, with the hope of inspiring future researches.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

A Survey of Applying Reinforcement Learning Techniques to Multicast Routing

2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) ◽

10.1109/uemcon47517.2019.8993014 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ola Ashour ◽

Marc St-Hilaire ◽

Thomas Kunz ◽

Maoyu Wang

Keyword(s):

Reinforcement Learning ◽

Multicast Routing ◽

Learning Techniques

Download Full-text

Optimizing time warp simulation with reinforcement learning techniques

2007 Winter Simulation Conference ◽

10.1109/wsc.2007.4419650 ◽

2007 ◽

Cited By ~ 9

Author(s):

Jun Wang ◽

Carl Tropper

Keyword(s):

Reinforcement Learning ◽

Time Warp ◽

Learning Techniques

Download Full-text

Intelligent Agents with Reinforcement Learning and Fuzzy logic for Intention commitment Modeling

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.253731 ◽

2006 ◽

Cited By ~ 1

Author(s):

Prasanna Lokuge ◽

Damminda Alahakoon

Keyword(s):

Fuzzy Logic ◽

Reinforcement Learning ◽

Intelligent Agents

Download Full-text

On collaborative reinforcement learning to optimize the redistribution of critical medical supplies throughout the COVID-19 pandemic

Journal of the American Medical Informatics Association ◽

10.1093/jamia/ocaa324 ◽

2020 ◽

Author(s):

Bryan P Bednarski ◽

Akash Deep Singh ◽

William M Jones

Keyword(s):

Public Health ◽

Reinforcement Learning ◽

Medical Equipment ◽

Census Bureau ◽

Learning Models ◽

Public Health Emergencies ◽

Medical Supplies ◽

Learning Techniques ◽

Disease Impact ◽

Random States

Abstract objective This work investigates how reinforcement learning and deep learning models can facilitate the near-optimal redistribution of medical equipment in order to bolster public health responses to future crises similar to the COVID-19 pandemic. materials and methods The system presented is simulated with disease impact statistics from the Institute of Health Metrics (IHME), Center for Disease Control, and Census Bureau[1, 2, 3]. We present a robust pipeline for data preprocessing, future demand inference, and a redistribution algorithm that can be adopted across broad scales and applications. results The reinforcement learning redistribution algorithm demonstrates performance optimality ranging from 93-95%. Performance improves consistently with the number of random states participating in exchange, demonstrating average shortage reductions of 78.74% (± 30.8) in simulations with 5 states to 93.50% (± 0.003) with 50 states. conclusion These findings bolster confidence that reinforcement learning techniques can reliably guide resource allocation for future public health emergencies.

Download Full-text