discounted criterion Latest Research Papers

Average-Reward Reinforcement Learning with Trust Region Methods

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/385 ◽

2021 ◽

Author(s):

Xiaoteng Ma ◽

Xiaohang Tang ◽

Li Xia ◽

Jun Yang ◽

Qianchuan Zhao

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Continuous Control ◽

Performance Bound ◽

Average Value ◽

Long Run ◽

Average Criterion ◽

Region Theory ◽

Policy Optimization ◽

Discounted Criterion

Most of reinforcement learning algorithms optimize the discounted criterion which is beneficial to accelerate the convergence and reduce the variance of estimates. Although the discounted criterion is appropriate for certain tasks such as financial related problems, many engineering problems treat future rewards equally and prefer a long-run average criterion. In this paper, we study the reinforcement learning problem with the long-run average criterion. Firstly, we develop a unified trust region theory with discounted and average criteria. With the average criterion, a novel performance bound within the trust region is derived with the Perturbation Analysis (PA) theory. Secondly, we propose a practical algorithm named Average Policy Optimization (APO), which improves the value estimation with a novel technique named Average Value Constraint. To the best of our knowledge, our work is the first one to study the trust region approach with the average criterion and it complements the framework of reinforcement learning beyond the discounted criterion. Finally, experiments are conducted in the continuous control environment MuJoCo. In most tasks, APO performs better than the discounted PPO, which demonstrates the effectiveness of our approach.

Download Full-text

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

E3S Web of Conferences ◽

10.1051/e3sconf/202122901047 ◽

2021 ◽

Vol 229 ◽

pp. 01047

Author(s):

Abdellatif Semmouri ◽

Mostafa Jourhmane ◽

Bahaa Eddine Elbaghazaoui

Keyword(s):

Markov Decision Processes ◽

Mobile Networks ◽

Decision Processes ◽

Stationary Policy ◽

Decomposition Approach ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Decision Epoch ◽

Discounted Criterion

In this paper we consider a constrained optimization of discrete time Markov Decision Processes (MDPs) with finite state and action spaces, which accumulate both a reward and costs at each decision epoch. We will study the problem of finding a policy that maximizes the expected total discounted reward subject to the constraints that the expected total discounted costs are not greater than given values. Thus, we will investigate the decomposition method of the state space into the strongly communicating classes for computing an optimal or a nearly optimal stationary policy. The discounted criterion has many applications in several areas such that the Forest Management, the Management of Energy Consumption, the finance, the Communication System (Mobile Networks) and the artificial intelligence.

Download Full-text

Impulse Control of Standard Brownian Motion: Discounted Criterion

IFIP Advances in Information and Communication Technology - System Modeling and Optimization ◽

10.1007/978-3-662-45504-3_15 ◽

2014 ◽

pp. 158-169

Author(s):

Kurt Helmes ◽

Richard H. Stockbridge ◽

Chao Zhu

Keyword(s):

Brownian Motion ◽

Impulse Control ◽

Standard Brownian Motion ◽

Discounted Criterion

Download Full-text

Optimal dynamic pricing of inventories with stochastic demand and discounted criterion

European Journal of Operational Research ◽

10.1016/j.ejor.2011.09.048 ◽

2012 ◽

Vol 217 (3) ◽

pp. 580-588 ◽

Cited By ~ 21

Author(s):

Ping Cao ◽

Jianbin Li ◽

Hong Yan

Keyword(s):

Dynamic Pricing ◽

Stochastic Demand ◽

Optimal Dynamic ◽

Discounted Criterion

Download Full-text

Estimation and Control of Stochastic Systems under Discounted Criterion

Frontiers in Adaptive Control ◽

10.5772/6431 ◽

2009 ◽

Author(s):

Hilgert Nadine ◽

Minjarez-Sosa J.

Keyword(s):

Stochastic Systems ◽

Estimation And Control ◽

And Control ◽

Discounted Criterion

Download Full-text

Semi-Markov control processes with unknown holding times distribution under a discounted criterion

Mathematical Methods of Operations Research ◽

10.1007/s001860400406 ◽

2005 ◽

Vol 61 (3) ◽

pp. 455-468 ◽

Cited By ~ 6

Author(s):

Fernando Luque-Vásquez ◽

J. Adolfo Minjárez-Sosa

Keyword(s):

Control Processes ◽

Discounted Criterion ◽

Markov Control Processes ◽

Markov Control

Download Full-text

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Journal of Applied Probability ◽

10.1239/jap/1025131422 ◽

2002 ◽

Vol 39 (2) ◽

pp. 233-250 ◽

Cited By ~ 11

Author(s):

Xianping Guo ◽

Weiping Zhu

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Action Space ◽

Birth And Death Processes ◽

Markov Decision ◽

General Action ◽

Discounted Criterion

In this paper, we consider denumerable-state continuous-time Markov decision processes with (possibly unbounded) transition and reward rates and general action space under the discounted criterion. We provide a set of conditions weaker than those previously known and then prove the existence of optimal stationary policies within the class of all possibly randomized Markov policies. Moreover, the results in this paper are illustrated by considering the birth-and-death processes with controlled immigration in which the conditions in this paper are satisfied, whereas the earlier conditions fail to hold.

Download Full-text

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Journal of Applied Probability ◽

10.1017/s0021900200022476 ◽

2002 ◽

Vol 39 (02) ◽

pp. 233-250 ◽

Cited By ~ 1

Author(s):

Xianping Guo ◽

Weiping Zhu

Keyword(s):

Markov Decision Processes ◽

Continuous Time ◽

Decision Processes ◽

Action Space ◽

Birth And Death Processes ◽

Markov Decision ◽

General Action ◽

Discounted Criterion

In this paper, we consider denumerable-state continuous-time Markov decision processes with (possibly unbounded) transition and reward rates and general action space under the discounted criterion. We provide a set of conditions weaker than those previously known and then prove the existence of optimal stationary policies within the class of all possibly randomized Markov policies. Moreover, the results in this paper are illustrated by considering the birth-and-death processes with controlled immigration in which the conditions in this paper are satisfied, whereas the earlier conditions fail to hold.

Download Full-text

Adaptive policies for time-varying stochastic systems under discounted criterion

Mathematical Methods of Operations Research ◽

10.1007/s001860100170 ◽

2001 ◽

Vol 54 (3) ◽

pp. 491-505 ◽

Cited By ~ 7

Author(s):

Nadine Hilgert ◽

J. Adolfo Minjárez-Sosa

Keyword(s):

Stochastic Systems ◽

Time Varying ◽

Adaptive Policies ◽

Discounted Criterion

Download Full-text

EXISTENCE OF OPTIMAL STATIONARY POLICIES IN FINITE DYNAMIC PROGRAMS WITH NONNEGATIVE REWARDS

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964801154082 ◽

2001 ◽

Vol 15 (4) ◽

pp. 557-564 ◽

Cited By ~ 1

Author(s):

Rolando Cavazos-Cadena ◽

Raúl Montes-de-Oca

Keyword(s):

Control Policy ◽

Stationary Policy ◽

Reward Function ◽

Total Reward ◽

Dynamic Programs ◽

Finite State ◽

Markov Decision ◽

Optimal Stationary Policy ◽

Action Spaces ◽

Discounted Criterion

This article concerns Markov decision chains with finite state and action spaces, and a control policy is graded via the expected total-reward criterion associated to a nonnegative reward function. Within this framework, a classical theorem guarantees the existence of an optimal stationary policy whenever the optimal value function is finite, a result that is obtained via a limit process using the discounted criterion. The objective of this article is to present an alternative approach, based entirely on the properties of the expected total-reward index, to establish such an existence result.

Download Full-text

discounted criterion
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Average-Reward Reinforcement Learning with Trust Region Methods

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

Impulse Control of Standard Brownian Motion: Discounted Criterion

Optimal dynamic pricing of inventories with stochastic demand and discounted criterion

Estimation and Control of Stochastic Systems under Discounted Criterion

Semi-Markov control processes with unknown holding times distribution under a discounted criterion

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Adaptive policies for time-varying stochastic systems under discounted criterion

EXISTENCE OF OPTIMAL STATIONARY POLICIES IN FINITE DYNAMIC PROGRAMS WITH NONNEGATIVE REWARDS

Export Citation Format

discounted criterionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Average-Reward Reinforcement Learning with Trust Region Methods

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

Impulse Control of Standard Brownian Motion: Discounted Criterion

Optimal dynamic pricing of inventories with stochastic demand and discounted criterion

Estimation and Control of Stochastic Systems under Discounted Criterion

Semi-Markov control processes with unknown holding times distribution under a discounted criterion

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Denumerable-state continuous-time Markov decision processes with unbounded transition and reward rates under the discounted criterion

Adaptive policies for time-varying stochastic systems under discounted criterion

EXISTENCE OF OPTIMAL STATIONARY POLICIES IN FINITE DYNAMIC PROGRAMS WITH NONNEGATIVE REWARDS

discounted criterion
Recently Published Documents