Strong 0-discount optimal policies in a Markov decision process with a Borel state space

We consider the Markov Decision Process (MDP) of selecting a subset of items at each step, termed the Select-MDP (S-MDP). The large state and action spaces of S-MDPs make them intractable to solve with typical reinforcement learning (RL) algorithms especially when the number of items is huge. In this paper, we present a deep RL algorithm to solve this issue by adopting the following key ideas. First, we convert the original S-MDP into an Iterative Select-MDP (IS-MDP), which is equivalent to the S-MDP in terms of optimal actions. IS-MDP decomposes a joint action of selecting K items simultaneously into K iterative selections resulting in the decrease of actions at the expense of an exponential increase of states. Second, we overcome this state space explosion by exploiting a special symmetry in IS-MDPs with novel weight shared Q-networks, which provably maintain sufficient expressive power. Various experiments demonstrate that our approach works well even when the item space is large and that it scales to environments with item spaces different from those used in training.

Download Full-text

Embedding a state space model into a Markov decision process

Annals of Operations Research ◽

10.1007/s10479-010-0688-z ◽

2010 ◽

Vol 190 (1) ◽

pp. 289-309 ◽

Cited By ~ 10

Author(s):

Lars Relund Nielsen ◽

Erik Jørgensen ◽

Søren Højsgaard

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

State Space Model ◽

Space Model ◽

Markov Decision

Download Full-text

Maximizing the probability of visiting a set infinitely often for a countable state space Markov decision process

Journal of Mathematical Analysis and Applications ◽

10.1016/j.jmaa.2021.125639 ◽

2021 ◽

pp. 125639

Author(s):

François Dufour ◽

Tomás Prieto-Rumeau

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Countable State Space ◽

Countable State ◽

Markov Decision

Download Full-text

A Dynamic Strategy for Home Pick-Up Service with Uncertain Customer Requests and Its Implementation

Sustainability ◽

10.3390/su11072060 ◽

2019 ◽

Vol 11 (7) ◽

pp. 2060

Author(s):

Yu Wu ◽

Bo Zeng ◽

Siming Huang

Keyword(s):

Dynamic Programming ◽

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Solution Method ◽

Computing Complexity ◽

Home Service ◽

Dynamic Strategy ◽

Markov Decision ◽

Capacitated Vehicle

In this paper, a home service problem is studied, where a capacitated vehicle collects customers’ parcels in one pick-up tour. We consider a situation where customers, who have scheduled their services in advance, may call to cancel their appointments, and customers, who do not have appointments, also need to be visited if they request for services as long as the capacity is allowed. To handle those changes that occurred over the tour, a dynamic strategy will be needed to guide the vehicle to visit customers in an efficient way. Aimed at minimizing the vehicle’s total expected travel distance, we model this problem as a multi-dimensional Markov Decision Process (MDP) with finite exponential scale state space. We exactly solve this MDP via dynamic programming, where the computing complexity is exponential. In order to avoid complexity continually increasing, we aim to develop a fast looking-up method for one already-examined state’s record. Although generally this will result in a huge waste of memory, by exploiting critical structural properties of the state space, we obtain an O ( 1 ) looking-up method without any waste of memory. Computational experiments demonstrate the effectiveness of our model and the developed solution method. For larger instances, two well-performed heuristics are proposed.

Download Full-text

Construction of Semi-Markov Decision Process Models of Continuous State Space Environments Using Growing Cell Structures and Multiagentk-Certainty Exploration Method

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2009.p0608 ◽

2009 ◽

Vol 13 (6) ◽

pp. 608-614

Author(s):

Takeshi Tateyama ◽

◽

Seiichi Kawata ◽

Yoshiki Shimomura ◽

◽

...

Keyword(s):

State Space ◽

Markov Decision Process ◽

Decision Process ◽

Discrete State ◽

Cell Structures ◽

Growing Cell ◽

Continuous State Space ◽

Continuous State ◽

Markov Decision ◽

Growing Cell Structures

k-certainty exploration method, an efficient reinforcement learning algorithm, is not applied to environments whose state space is continuous because continuous state space must be changed to discrete state space. Our purpose is to construct discrete semi-Markov decision process (SMDP) models of such environments using growing cell structures to autonomously divide continuous state space then usingk-certainty exploration method to construct SMDP models. Multiagentk-certainty exploration method is then used to improve exploration efficiency. Mobile robot simulation demonstrated our proposal's usefulness and efficiency.

Download Full-text

Optimal policies based on QoS for adaptive communication system with Markov Decision Process

2008 2nd International Conference on Anti-counterfeiting, Security and Identification ◽

10.1109/iwasid.2008.4688369 ◽

2008 ◽

Author(s):

Yongxiang Wu ◽

Shengbo Hu

Keyword(s):

Markov Decision Process ◽

Communication System ◽

Decision Process ◽

Adaptive Communication ◽

Optimal Policies ◽

Markov Decision

Download Full-text

Optimal Occupation in the Complete Graph

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964800002989 ◽

1993 ◽

Vol 7 (3) ◽

pp. 369-385 ◽

Cited By ~ 1

Author(s):

Kyle Siegrist

Keyword(s):

Markov Decision Process ◽

Complete Graph ◽

Decision Process ◽

Value Function ◽

Comparison Result ◽

State Action ◽

Optimal Policies ◽

Markov Decision ◽

The Cost ◽

The Value Function

We consider N sites (N ≤ ∞), each of which may be either occupied or unoccupied. Time is discrete, and at each time unit a set of occupied sites may attempt to capture a previously unoccupied site. The attempt will be successful with a probability that depends on the number of sites making the attempt, in which case the new site will also be occupied. A benefit is gained when new sites are occupied, but capture attempts are costly. The problem of optimal occupation is formulated as a Markov decision process in which the admissible actions are occupation strategies and the cost is a function of the strategy and the number of occupied sites. A partial order on the state-action pairs is used to obtain a comparison result for stationary policies and qualitative results concerning monotonicity of the value function for the n-stage problem (n ≤ ∞). The optimal policies are partially characterized when the cost depends on the action only through the total number of occupation attempts made.

Download Full-text