value function approximation
Recently Published Documents


TOTAL DOCUMENTS

90
(FIVE YEARS 6)

H-INDEX

14
(FIVE YEARS 0)

Author(s):  
Shuang Wu ◽  
Jingyu Zhao ◽  
Guangjian Tian ◽  
Jun Wang

The restless multi-armed bandit (RMAB) problem is a generalization of the multi-armed bandit with non-stationary rewards. Its optimal solution is intractable due to exponentially large state and action spaces with respect to the number of arms. Existing approximation approaches, e.g., Whittle's index policy, have difficulty in capturing either temporal or spatial factors such as impacts from other arms. We propose considering both factors using the attention mechanism, which has achieved great success in deep learning. Our state-aware value function approximation solution comprises an attention-based value function approximator and a Bellman equation solver. The attention-based coordination module capture both spatial and temporal factors for arm coordination. The Bellman equation solver utilizes the decoupling structure of RMABs to acquire solutions with significantly reduced computation overheads. In particular, the time complexity of our approximation is linear in the number of arms. Finally, we illustrate the effectiveness and investigate the properties of our proposed method with numerical experiments.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Xin Pan ◽  
Hanqi Wen ◽  
Ziwei Wang ◽  
Jie Song ◽  
Xing Lin Feng

PurposeDigital healthcare has become one of the most important Internet applications in the recent years, and digital platforms have been acting as interfaces between the patients and physicians. Although these technologies enhance patient convenience, they create new challenges in platform management. For instance, on physician rating websites, information overload negatively influences patients' decision-making in relation to selecting a physician. This scenario calls for an automated mechanism to provide real-time rankings of physicians. Motivated by an online healthcare platform, this study develops a method to deliver physician ranking on platforms by considering patients' browse behaviors and the capacities of service resources.Design/methodology/approachThe authors use a probabilistic model for explicitly capturing the browse behaviors of patients. Since the large volume of information in digital systems makes it intractable to solve the dynamic ranking problem, we design a ranking with value approximation algorithm that combines a greedy ranking policy and the value function approximation methods.FindingsThe authors found that the approximation methods are quite effective in dealing with the ranking optimization on the digital healthcare system, and it is mainly because the authors incorporate the patient behaviors and patient availability in the model.Originality/valueTo the best of the authors’ knowledge, this is one of the first studies to present solutions to the dynamic physician ranking problem. The ranking algorithms can also help platforms improve system and operational performance.


2020 ◽  
Vol 54 (4) ◽  
pp. 1016-1033 ◽  
Author(s):  
Marlin W. Ulmer

An increasing number of e-commerce retailers offers same-day delivery. To deliver the ordered goods, providers dynamically dispatch a fleet of vehicles transporting the goods from the warehouse to the customers. In many cases, retailers offer different delivery deadline options, from four-hour delivery up to next-hour delivery. Due to the deadlines, vehicles often only deliver a few orders per trip. The overall number of served orders within the delivery horizon is small and the revenue low. As a result, many companies currently struggle to conduct same-day delivery cost-efficiently. In this paper, we show how dynamic pricing is able to substantially increase both revenue and the number of customers we are able to serve the same day. To this end, we present an anticipatory pricing and routing policy (APRP) method that incentivizes customers to select delivery deadline options efficiently for the fleet to fulfill. This maintains the fleet’s flexibility to serve more future orders. We model the respective pricing and routing problem as a Markov decision process (MDP). To apply APRP, the state-dependent opportunity costs per customer and option are required. To this end, we use a guided offline value function approximation (VFA) based on state space aggregation. The VFA approximates the opportunity cost for every state and delivery option with respect to the fleet’s flexibility. As an offline method, APRP is able to determine suitable prices instantly when a customer orders. In an extensive computational study, we compare APRP with a policy based on fixed prices and with conventional temporal and geographical pricing policies. APRP outperforms the benchmark policies significantly, leading to both a higher revenue and more customers served the same day.


2020 ◽  
Vol 54 (4) ◽  
pp. 998-1015 ◽  
Author(s):  
Carlos Lagos ◽  
Felipe Delgado ◽  
Mathias A. Klapp

The occurrence of unexpected aircraft maintenance tasks can produce expensive changes in an airline’s operation. When it comes to critical tasks, it might even cancel programmed flights. Despite this, the challenge of scheduling aircraft maintenance operations under uncertainty has received limited attention in the scientific literature. We study a dynamic airline maintenance scheduling problem, which daily decides the set of aircraft to maintain and the set of pending tasks to execute in each aircraft. The objective is to minimize the expected costs of expired maintenance tasks over the operating horizon. To increase flexibility and reduce costs, we integrate maintenance scheduling with tail assignment decisions. We formulate our problem as a Markov decision process and design dynamic policies based on approximate dynamic programming, including value function approximation, rolling horizon techniques, and a hybrid policy between the latter two that delivers the best results. In a case study based on LATAM airline, we show the value of dynamic optimization by testing our best policies against a simple airline decision rule and a deterministic relaxation with perfect future information. We suggest to schedule tasks requiring less resources first to increase utilization of residual maintenance capacity. Finally, we observe strong economies of scale when sharing maintenance resources between multiple airlines.


Sign in / Sign up

Export Citation Format

Share Document