value function approximation Latest Research Papers

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.

Download Full-text

Physician ranking optimization based on patients' browse behaviors and resource capacities

Internet Research ◽

10.1108/intr-10-2020-0609 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Xin Pan ◽

Hanqi Wen ◽

Ziwei Wang ◽

Jie Song ◽

Xing Lin Feng

Keyword(s):

Information Overload ◽

Approximation Methods ◽

Ranking Problem ◽

Internet Applications ◽

Content Type ◽

Digital Platforms ◽

Value Function Approximation ◽

Ranking Algorithms ◽

Physician Rating ◽

Digital Healthcare

PurposeDigital healthcare has become one of the most important Internet applications in the recent years, and digital platforms have been acting as interfaces between the patients and physicians. Although these technologies enhance patient convenience, they create new challenges in platform management. For instance, on physician rating websites, information overload negatively influences patients' decision-making in relation to selecting a physician. This scenario calls for an automated mechanism to provide real-time rankings of physicians. Motivated by an online healthcare platform, this study develops a method to deliver physician ranking on platforms by considering patients' browse behaviors and the capacities of service resources.Design/methodology/approachThe authors use a probabilistic model for explicitly capturing the browse behaviors of patients. Since the large volume of information in digital systems makes it intractable to solve the dynamic ranking problem, we design a ranking with value approximation algorithm that combines a greedy ranking policy and the value function approximation methods.FindingsThe authors found that the approximation methods are quite effective in dealing with the ranking optimization on the digital healthcare system, and it is mainly because the authors incorporate the patient behaviors and patient availability in the model.Originality/valueTo the best of the authors’ knowledge, this is one of the first studies to present solutions to the dynamic physician ranking problem. The ranking algorithms can also help platforms improve system and operational performance.

Download Full-text

Reinforcement Learning for Control Using Value Function Approximation

Encyclopedia of Systems and Control ◽

10.1007/978-3-030-44184-5_100067 ◽

2021 ◽

pp. 1868-1873

Author(s):

Konstantinos Gatsis ◽

George J. Pappas

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

Value Function Approximation

Download Full-text

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

IEEE Control Systems Letters ◽

10.1109/lcsys.2021.3134199 ◽

2021 ◽

pp. 1-1

Author(s):

Dennis Kibalama ◽

Yuxing Liu ◽

Stephanie Stockar ◽

Marcello Canova

Keyword(s):

Model Predictive Control ◽

Control Systems ◽

Predictive Control ◽

Function Approximation ◽

Value Function ◽

Climate Control ◽

Value Function Approximation

Download Full-text

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

IEEE Transactions on Control of Network Systems ◽

10.1109/tcns.2021.3061909 ◽

2021 ◽

pp. 1-1

Author(s):

Milos S. Stankovic ◽

Marko Beko ◽

Srdjan S. Stankovic

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

Value Function Approximation ◽

Multi Agent

Download Full-text

Meso-parametric value function approximation for dynamic customer acceptances in delivery routing

European Journal of Operational Research ◽

10.1016/j.ejor.2019.04.029 ◽

2020 ◽

Vol 285 (1) ◽

pp. 183-195 ◽

Cited By ~ 4

Author(s):

Marlin W. Ulmer ◽

Barrett W. Thomas

Keyword(s):

Function Approximation ◽

Value Function ◽

Value Function Approximation

Download Full-text

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

Robotics: Science and Systems XVI ◽

10.15607/rss.2020.xvi.050 ◽

2020 ◽

Author(s):

Junhong Xu ◽

Kai Yin ◽

Lantao Liu

Keyword(s):

Markov Decision Processes ◽

Function Approximation ◽

Value Function ◽

Decision Processes ◽

Value Function Approximation ◽

Continuous State ◽

Markov Decision

Download Full-text

Dynamic Pricing and Routing for Same-Day Delivery

Transportation Science ◽

10.1287/trsc.2019.0958 ◽

2020 ◽

Vol 54 (4) ◽

pp. 1016-1033 ◽

Cited By ~ 3

Author(s):

Marlin W. Ulmer

Keyword(s):

Dynamic Pricing ◽

Function Approximation ◽

Computational Study ◽

Routing Problem ◽

Value Function Approximation ◽

Routing Policy ◽

State Dependent ◽

Markov Decision ◽

Fixed Prices ◽

Number Of Customers

An increasing number of e-commerce retailers offers same-day delivery. To deliver the ordered goods, providers dynamically dispatch a fleet of vehicles transporting the goods from the warehouse to the customers. In many cases, retailers offer different delivery deadline options, from four-hour delivery up to next-hour delivery. Due to the deadlines, vehicles often only deliver a few orders per trip. The overall number of served orders within the delivery horizon is small and the revenue low. As a result, many companies currently struggle to conduct same-day delivery cost-efficiently. In this paper, we show how dynamic pricing is able to substantially increase both revenue and the number of customers we are able to serve the same day. To this end, we present an anticipatory pricing and routing policy (APRP) method that incentivizes customers to select delivery deadline options efficiently for the fleet to fulfill. This maintains the fleet’s flexibility to serve more future orders. We model the respective pricing and routing problem as a Markov decision process (MDP). To apply APRP, the state-dependent opportunity costs per customer and option are required. To this end, we use a guided offline value function approximation (VFA) based on state space aggregation. The VFA approximates the opportunity cost for every state and delivery option with respect to the fleet’s flexibility. As an offline method, APRP is able to determine suitable prices instantly when a customer orders. In an extensive computational study, we compare APRP with a policy based on fixed prices and with conventional temporal and geographical pricing policies. APRP outperforms the benchmark policies significantly, leading to both a higher revenue and more customers served the same day.

Download Full-text

Dynamic Optimization for Airline Maintenance Operations

Transportation Science ◽

10.1287/trsc.2020.0984 ◽

2020 ◽

Vol 54 (4) ◽

pp. 998-1015 ◽

Cited By ~ 1

Author(s):

Carlos Lagos ◽

Felipe Delgado ◽

Mathias A. Klapp

Keyword(s):

Dynamic Optimization ◽

Function Approximation ◽

Economies Of Scale ◽

Maintenance Scheduling ◽

Limited Attention ◽

Aircraft Maintenance ◽

Value Function Approximation ◽

Markov Decision ◽

Airline Maintenance

The occurrence of unexpected aircraft maintenance tasks can produce expensive changes in an airline’s operation. When it comes to critical tasks, it might even cancel programmed flights. Despite this, the challenge of scheduling aircraft maintenance operations under uncertainty has received limited attention in the scientific literature. We study a dynamic airline maintenance scheduling problem, which daily decides the set of aircraft to maintain and the set of pending tasks to execute in each aircraft. The objective is to minimize the expected costs of expired maintenance tasks over the operating horizon. To increase flexibility and reduce costs, we integrate maintenance scheduling with tail assignment decisions. We formulate our problem as a Markov decision process and design dynamic policies based on approximate dynamic programming, including value function approximation, rolling horizon techniques, and a hybrid policy between the latter two that delivers the best results. In a case study based on LATAM airline, we show the value of dynamic optimization by testing our best policies against a simple airline decision rule and a deterministic relaxation with perfect future information. We suggest to schedule tasks requiring less resources first to increase utilization of residual maintenance capacity. Finally, we observe strong economies of scale when sharing maintenance resources between multiple airlines.

Download Full-text

value function approximation
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Low-rank State-action Value-function Approximation

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Physician ranking optimization based on patients' browse behaviors and resource capacities

Reinforcement Learning for Control Using Value Function Approximation

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

Meso-parametric value function approximation for dynamic customer acceptances in delivery routing

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

Dynamic Pricing and Routing for Same-Day Delivery

Dynamic Optimization for Airline Maintenance Operations

Export Citation Format

value function approximationRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Low-rank State-action Value-function Approximation

State-Aware Value Function Approximation with Attention Mechanism for Restless Multi-armed Bandits

Physician ranking optimization based on patients' browse behaviors and resource capacities

Reinforcement Learning for Control Using Value Function Approximation

Model Predictive Control for Automotive Climate Control Systems via Value Function Approximation

Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning

Meso-parametric value function approximation for dynamic customer acceptances in delivery routing

Kernel Taylor-Based Value Function Approximation for Continuous-State Markov Decision Processes

Dynamic Pricing and Routing for Same-Day Delivery

Dynamic Optimization for Airline Maintenance Operations

value function approximation
Recently Published Documents