Adaptive aggregation methods for infinite horizon dynamic programming

Approximate dynamic programming, also known as reinforcement learning, is applied for optimal control of Antilock Brake Systems (ABS) in ground vehicles. As an accurate and control oriented model of the brake system, quarter vehicle model with hydraulic brake system is selected. Due to the switching nature of hydraulic brake system of ABS, an optimal switching solution is generated through minimizing a performance index that penalizes the braking distance and forces the vehicle velocity to go to zero, while preventing wheel lock-ups. Towards this objective, a value iteration algorithm is selected for ‘learning’ the infinite horizon solution. Artificial neural networks, as powerful function approximators, are utilized for approximating the value function. The training is conducted offline using least squares. Once trained, the converged neural network is used for determining optimal decisions for the actuators on the fly. Numerical simulations show that this approach is very promising while having low real-time computational burden, hence, outperforms many existing solutions in the literature.

Download Full-text

Dynamic macro I: Infinite horizon models

Introduction to Computational Economics Using Fortran ◽

10.1093/oso/9780198804390.003.0014 ◽

2018 ◽

Author(s):

Hans Fehr ◽

Fabian Kindermann

Keyword(s):

Dynamic Programming ◽

Infinite Horizon ◽

Planning Horizon ◽

Macroeconomic Models ◽

Total Utility ◽

The World ◽

Famous Argument ◽

Infinite Planning Horizon

In this chapter we apply the principles of dynamic programming to some standard macroeconomic models. For now we stay in the world of infinite horizon models, which are characterized by the fact that they are populated by one or several households with an infinite planning horizon, similar to the previous chapter. There are several justifications for such an assumption. Beneath simplicity, altruism is probably the most famous argument in favour of infinite horizon models. Assume that in a period t there is one generation that dies with certainty after this period.The utility of this generation from its own consumption is u(·). Yet, each generation is altruistic towards its descendants. Consequently, total utility of the generation is Ut = u(·) + βUt+1 where β ≤ 1 can be interpreted as the degree of altruism. All generations together then form a dynasty.

Download Full-text

Robust Dynamic Programming for Discounted Infinite-Horizon Markov Decision Processes with Uncertain Stationary Transition Matrice

2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning ◽

10.1109/adprl.2007.368175 ◽

2007 ◽

Cited By ~ 17

Author(s):

Baohua Li ◽

Jennie Si

Keyword(s):

Dynamic Programming ◽

Markov Decision Processes ◽

Infinite Horizon ◽

Decision Processes ◽

Transition Matrice ◽

Markov Decision ◽

Stationary Transition

Download Full-text