Approximate Dynamic Programming for Military Medical Evacuation Dispatching Policies

Author(s):  
Phillip R. Jenkins ◽  
Matthew J. Robbins ◽  
Brian J. Lunday

Military medical planners must consider how aerial medical evacuation (MEDEVAC) assets will be dispatched when preparing for and supporting high-intensity combat operations. The dispatching authority seeks to dispatch MEDEVAC assets to prioritized requests for service, such that battlefield casualties are effectively and efficiently transported to nearby medical-treatment facilities. We formulate and solve a discounted, infinite-horizon Markov decision process (MDP) model of the MEDEVAC dispatching problem. Because the high dimensionality and uncountable state space of our MDP model renders classical dynamic programming solution methods intractable, we instead apply approximate dynamic programming (ADP) solution methods to produce high-quality dispatching policies relative to the currently practiced closest-available dispatching policy. We develop, test, and compare two distinct ADP solution techniques, both of which utilize an approximate policy iteration (API) algorithmic framework. The first algorithm uses least-squares temporal differences (LSTD) learning for policy evaluation, whereas the second algorithm uses neural network (NN) learning. We construct a notional, yet representative planning scenario based on high-intensity combat operations in southern Azerbaijan to demonstrate the applicability of our MDP model and to compare the efficacies of our proposed ADP solution techniques. We generate 30 problem instances via a designed experiment to examine how selected problem features and algorithmic features affect the quality of solutions attained by our ADP policies. Results show that the respective policies determined by the NN-API and LSTD-API algorithms significantly outperform the closest-available benchmark policies in 27 (90%) and 24 (80%) of the problem instances examined. Moreover, the NN-API policies significantly outperform the LSTD-API policies in each of the problem instances examined. Compared with the closest-available policy for the baseline problem instance, the NN-API policy decreases the average response time of important urgent (i.e., life-threatening) requests by 39 minutes. These research models, methodologies, and results inform the implementation and modification of current and future MEDEVAC tactics, techniques, and procedures, as well as the design and purchase of future aerial MEDEVAC assets.

Author(s):  
Tohid Sardarmehni ◽  
Ali Heydari

Approximate dynamic programming, also known as reinforcement learning, is applied for optimal control of Antilock Brake Systems (ABS) in ground vehicles. As an accurate and control oriented model of the brake system, quarter vehicle model with hydraulic brake system is selected. Due to the switching nature of hydraulic brake system of ABS, an optimal switching solution is generated through minimizing a performance index that penalizes the braking distance and forces the vehicle velocity to go to zero, while preventing wheel lock-ups. Towards this objective, a value iteration algorithm is selected for ‘learning’ the infinite horizon solution. Artificial neural networks, as powerful function approximators, are utilized for approximating the value function. The training is conducted offline using least squares. Once trained, the converged neural network is used for determining optimal decisions for the actuators on the fly. Numerical simulations show that this approach is very promising while having low real-time computational burden, hence, outperforms many existing solutions in the literature.


2012 ◽  
Vol 26 (4) ◽  
pp. 581-591 ◽  
Author(s):  
D. Roubos ◽  
S. Bhulai

We consider the problem of dynamic multi-skill routing in call centers. Calls from different customer classes are offered to the call center according to a Poisson process. The agents are grouped into pools according to their heterogeneous skill sets that determine the calls that they can handle. Each pool of agents serves calls with independent exponentially distributed service times. Arriving calls that cannot be served directly are placed in a buffer that is dedicated to the customer class. We obtain nearly optimal dynamic routing policies that are scalable with the problem instance and can be computed online. The algorithm is based on approximate dynamic programming techniques. In particular, we perform one-step policy improvement using a polynomial approximation to relative value functions. We compare the performance of this method with decomposition techniques. Numerical experiments demonstrate that our method outperforms leading routing policies and has close to optimal performance.


2020 ◽  
Vol 124 ◽  
pp. 105032
Author(s):  
Ying Chen ◽  
Feng Liu ◽  
Jay M. Rosenberger ◽  
Victoria C.P. Chen ◽  
Asama Kulvanitchaiyanunt ◽  
...  

Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 187
Author(s):  
Aaron Barbosa ◽  
Elijah Pelofske ◽  
Georg Hahn ◽  
Hristo N. Djidjev

Quantum annealers, such as the device built by D-Wave Systems, Inc., offer a way to compute solutions of NP-hard problems that can be expressed in Ising or quadratic unconstrained binary optimization (QUBO) form. Although such solutions are typically of very high quality, problem instances are usually not solved to optimality due to imperfections of the current generations quantum annealers. In this contribution, we aim to understand some of the factors contributing to the hardness of a problem instance, and to use machine learning models to predict the accuracy of the D-Wave 2000Q annealer for solving specific problems. We focus on the maximum clique problem, a classic NP-hard problem with important applications in network analysis, bioinformatics, and computational chemistry. By training a machine learning classification model on basic problem characteristics such as the number of edges in the graph, or annealing parameters, such as the D-Wave’s chain strength, we are able to rank certain features in the order of their contribution to the solution hardness, and present a simple decision tree which allows to predict whether a problem will be solvable to optimality with the D-Wave 2000Q. We extend these results by training a machine learning regression model that predicts the clique size found by D-Wave.


Sign in / Sign up

Export Citation Format

Share Document