scholarly journals Time reduction of the Dynamic Programming computation in the case of hybrid vehicle

2017 ◽  
Vol 53 ◽  
pp. S213-S227 ◽  
Author(s):  
Emmanuel Vinot
Author(s):  
Rajit Johri ◽  
Ashwin Salvi ◽  
Zoran Filipi

This paper proposes a self-learning approach to develop optimal power management with multiple objectives, e.g. to minimize fuel consumption and transient engine-out NOx and particulate matter emission for a series hydraulic hybrid vehicle. Addressing multiple objectives is particularly relevant in the case of a diesel powered hydraulic hybrid since it has been shown that managing engine transients can significantly reduce real-world emissions. The problem is formulated as an infinite time horizon stochastic sequential decision making/markovian problem. The problem is computationally intractable by conventional Dynamic programming due to large number of states and complex modeling issues. Therefore, the paper proposes an online self-learning neural controller based on the fundamental principles of Neuro-Dynamic Programming (NDP) and reinforcement learning. The controller learns from its interactions with the environment and improves its performance over time. The controller tries to minimize multiple objectives and continues to evolve until a global solution is achieved. The control law is a stationary full state feedback based on 5 states and can be directly implemented. The controller performance is then evaluated in the Engine-in-the-Loop (EIL) facility.


2020 ◽  
Vol 34 (02) ◽  
pp. 1684-1691
Author(s):  
Shenghe Xu ◽  
Shivendra S. Panwar ◽  
Murali Kodialam ◽  
T.V. Lakshman

In this paper, we propose a general framework for combining deep neural networks (DNNs) with dynamic programming to solve combinatorial optimization problems. For problems that can be broken into smaller subproblems and solved by dynamic programming, we train a set of neural networks to replace value or policy functions at each decision step. Two variants of the neural network approximated dynamic programming (NDP) methods are proposed; in the value-based NDP method, the networks learn to estimate the value of each choice at the corresponding step, while in the policy-based NDP method the DNNs only estimate the best decision at each step. The training procedure of the NDP starts from the smallest problem size and a new DNN for the next size is trained to cooperate with previous DNNs. After all the DNNs are trained, the networks are fine-tuned together to further improve overall performance. We test NDP on the linear sum assignment problem, the traveling salesman problem and the talent scheduling problem. Experimental results show that NDP can achieve considerable computation time reduction on hard problems with reasonable performance loss. In general, NDP can be applied to reducible combinatorial optimization problems for the purpose of computation time reduction.


2012 ◽  
Vol 58 (2/3/4) ◽  
pp. 367 ◽  
Author(s):  
Maxime Debert ◽  
Thomas Miro Padovani ◽  
Guillaume Colin ◽  
Yann Chamaillard ◽  
Lino Guzzella

Sign in / Sign up

Export Citation Format

Share Document