Intelligent Ramp Control for Incident Response Using Dyna-QArchitecture

Mathematical Problems in Engineering ◽

10.1155/2015/896943 ◽

2015 ◽

Vol 2015 ◽

pp. 1-16

Author(s):

Chao Lu ◽

Yanan Zhao ◽

Jianwei Gong

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Single Agent ◽

Superior Performance ◽

Model Free ◽

Road Users ◽

Total Travel Time ◽

The Uk ◽

Traffic Operation ◽

Ramp Control

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.

Download Full-text

Deep Reinforcement Learning Based Optimal Route and Charging Station Selection

Energies ◽

10.3390/en13236255 ◽

2020 ◽

Vol 13 (23) ◽

pp. 6255

Author(s):

Ki-Beom Lee ◽

Mohamed A. Ahmed ◽

Dong-Ki Kang ◽

Young-Chon Kim

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Waiting Time ◽

Process Model ◽

Transition Probability ◽

Optimal Route ◽

Charging Station ◽

Traffic Conditions ◽

Model Free ◽

Ev Charging

This paper proposes an optimal route and charging station selection (RCS) algorithm based on model-free deep reinforcement learning (DRL) to overcome the uncertainty issues of the traffic conditions and dynamic arrival charging requests. The proposed DRL based RCS algorithm aims to minimize the total travel time of electric vehicles (EV) charging requests from origin to destination using the selection of the optimal route and charging station considering dynamically changing traffic conditions and unknown future requests. In this paper, we formulate this RCS problem as a Markov decision process model with unknown transition probability. A Deep Q network has been adopted with function approximation to find the optimal electric vehicle charging station (EVCS) selection policy. To obtain the feature states for each EVCS, we define the traffic preprocess module, charging preprocess module and feature extract module. The proposed DRL based RCS algorithm is compared with conventional strategies such as minimum distance, minimum travel time, and minimum waiting time. The performance is evaluated in terms of travel time, waiting time, charging time, driving time, and distance under the various distributions and number of EV charging requests.

Download Full-text

Reinforcement Learning for Ramp Control: An Analysis of Learning Parameters

PROMET - Traffic&Transportation ◽

10.7307/ptt.v28i4.1830 ◽

2016 ◽

Vol 28 (4) ◽

pp. 371-381 ◽

Cited By ~ 3

Author(s):

Chao Lu ◽

Jie Huang ◽

Jianwei Gong

Keyword(s):

Reinforcement Learning ◽

Discount Rate ◽

Model Simulation ◽

Action Selection ◽

Control Agent ◽

Learning Rate ◽

Superior Performance ◽

Algorithm Performance ◽

Selection Parameter ◽

Ramp Control

Reinforcement Learning (RL) has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.

Download Full-text

Optimizing Snowplow Routes using All-New Perspectives: Road Users and Winter Road Maintenance Operators

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2019-0768 ◽

2020 ◽

Author(s):

Shuoyan Xu ◽

Tae J. Kwon

Keyword(s):

Travel Time ◽

Search Algorithm ◽

Discrete Event ◽

Road Maintenance ◽

Event Simulation ◽

Road Users ◽

Routing Strategy ◽

Total Travel Time ◽

Winter Road ◽

Winter Road Maintenance

Snowplowing is an indispensable part of winter road maintenance because it contributes to improving drivers’ mobility and safety. Most prior studies, however, limit focus to operational aspects without considering interaction with road users. This paper aims to minimize travel time in the traffic system while still maintaining the efficiency of plowing activities. The k-truck plowing with the precedence model was formulated to link road users with snowplow activities. The tabu search algorithm then followed to optimize the order of precedence of snowplow routes. Furthermore, the discrete-event simulation was used to quantify the effectiveness of the proposed method based on different scenarios and fleet sizes, and to illustrate the potential benefit of integrating both the operators’ and users’ perspectives. The methodological framework developed herein can be used to design a routing strategy that improves the performance of the snowplow trucks by reducing both plowing completion and road users’ total travel time.

Download Full-text

Decentralized Multiagent Actor-Critic Algorithm Based on Message Diffusion

Journal of Sensors ◽

10.1155/2021/8739206 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Siyuan Ding ◽

Shengxiang Li ◽

Guangyi Liu ◽

Ou Li ◽

Ke Ke ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Superior Performance ◽

Global State ◽

Individual Agent ◽

Joint Actions ◽

Model Free ◽

Linear Function Approximation ◽

Multiagent Reinforcement Learning ◽

Value Decomposition

The exponential explosion of joint actions and massive data collection are two main challenges in multiagent reinforcement learning algorithms with centralized training. To overcome these problems, in this paper, we propose a model-free and fully decentralized actor-critic multiagent reinforcement learning algorithm based on message diffusion. To this end, the agents are assumed to be placed in a time-varying communication network. Each agent makes limited observations regarding the global state and joint actions; therefore, it needs to obtain and share information with others over the network. In the proposed algorithm, agents hold local estimations of the global state and joint actions and update them with local observations and the messages received from neighbors. Under the hypothesis of the global value decomposition, the gradient of the global objective function to an individual agent is derived. The convergence of the proposed algorithm with linear function approximation is guaranteed according to the stochastic approximation theory. In the experiments, the proposed algorithm was applied to a passive location task multiagent environment and achieved superior performance compared to state-of-the-art algorithms.

Download Full-text

Reinforcement Learning Technique in Multiple Motorway Access Control Strategy Design

PROMET - Traffic&Transportation ◽

10.7307/ptt.v22i2.170 ◽

2012 ◽

Vol 22 (2) ◽

pp. 117-123 ◽

Cited By ~ 3

Author(s):

Kostandina Veljanovska ◽

Kristi M. Bombol ◽

Tomaž Maher

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Access Control ◽

Travel Time ◽

Control Strategy ◽

Learning Algorithm ◽

Traffic Flows ◽

Traffic Demand ◽

Q Learning ◽

Total Travel Time

An appropriately designed motorway access control can decrease the total travel time spent in the system up to 30% and consequently increase the merging operations safety. To date, implemented traffic responsive motorway access control systems have been of local or regulatory type and not truly adaptive in the real sense of the meaning. Hence, traffic flow can be influenced positively by numerous intelligent transportation system (ITS) techniques. In this paper a contemporary approach is presented. It considers the design philosophy of an optimal and adaptive closed-loop multiple motorway access control strategy. The methodology proposed uses the artificial intelligence technique - known as reinforcement learning (RL) with multiple agents, and applies the Q-learning algorithm. One segment of the motorway network with three lanes in each direction and three motorway entries was designed. The detectors and traffic signals were placed at the entries (ramps). Traffic flows and traffic occupancy on the main line as well as the traffic demand on the motorway entries were taken as input model variables. The output variables referred to the travel speed on the corridor, the total travel time, and the total stop time. VISSIM micro-simulator and direct programming of the simulator functions were used in order to implement the RL technique. The peak hour was chosen for the time of simulation. The model was tested in two phases. Its effectiveness was compared to ALINEA. It was observed that the proposed strategy was capable of responding both to dynamic sensory inputs from the environment and to dynamically changing environment. The model of the environment and supervision were not required. The control policy changed as response to the inherent system characteristic changes. It was confirmed that the strategy was truly adaptive and real-time responsive to the traffic demand on the corridor. KEY WORDS: motorway access, traffic flows, control, strategy, artificial intelligence, Q-Learning, simulation

Download Full-text

Shaping Model-Free Reinforcement-Learning with Model-Based Pseudorewards

10.32470/ccn.2018.1191-0 ◽

2018 ◽

Author(s):

Paul Krueger ◽

Thomas Griffiths

Keyword(s):

Reinforcement Learning ◽

Model Based ◽

Model Free

Download Full-text

Model-Based and Model-Free Social Cognition

10.31234/osf.io/ue6j2 ◽

2019 ◽

Author(s):

Leor M Hackel ◽

Jeffrey Jordan Berg ◽

Björn Lindström ◽

David Amodio

Keyword(s):

Reinforcement Learning ◽

Social Cognition ◽

Learning Strategies ◽

Memory Systems ◽

Learning Task ◽

Financial Advisors ◽

Model Based ◽

Model Free ◽

Systems Model ◽

Task Assessment

Do habits play a role in our social impressions? To investigate the contribution of habits to the formation of social attitudes, we examined the roles of model-free and model-based reinforcement learning in social interactions—computations linked in past work to habit and planning, respectively. Participants in this study learned about novel individuals in a sequential reinforcement learning paradigm, choosing financial advisors who led them to high- or low-paying stocks. Results indicated that participants relied on both model-based and model-free learning, such that each independently predicted choice during the learning task and self-reported liking in a post-task assessment. Specifically, participants liked advisors who could provide large future rewards as well as advisors who had provided them with large rewards in the past. Moreover, participants varied in their use of model-based and model-free learning strategies, and this individual difference influenced the way in which learning related to self-reported attitudes: among participants who relied more on model-free learning, model-free social learning related more to post-task attitudes. We discuss implications for attitudes, trait impressions, and social behavior, as well as the role of habits in a memory systems model of social cognition.

Download Full-text

Faculty Opinions recommendation of States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.4125957.4076054 ◽

2010 ◽

Author(s):

Susan Courtney

Keyword(s):

Reinforcement Learning ◽

Prediction Error ◽

Model Based ◽

Model Free

Download Full-text

Model-Free Event-Triggered Optimal Consensus Control of Multiple Euler-Lagrange Systems via Reinforcement Learning

IEEE Transactions on Network Science and Engineering ◽

10.1109/tnse.2020.3036604 ◽

2020 ◽

pp. 1-1

Author(s):

Saiwei Wang ◽

Xin Jin ◽

Shuai Mao ◽

Athanasios V. Vasilakos ◽

Yang Tang

Keyword(s):

Reinforcement Learning ◽

Consensus Control ◽

Model Free ◽

Event Triggered

Download Full-text

Multi-Agent Reinforcement Learning: A Review of Challenges and Applications

Applied Sciences ◽

10.3390/app11114948 ◽

2021 ◽

Vol 11 (11) ◽

pp. 4948

Author(s):

Lorenzo Canese ◽

Gian Carlo Cardarilli ◽

Luca Di Di Nunzio ◽

Rocco Fazzolari ◽

Daniele Giardino ◽

...

Keyword(s):

Reinforcement Learning ◽

Mathematical Models ◽

Learning Algorithms ◽

Single Agent ◽

Critical Issues ◽

Multi Agent ◽

Pros And Cons ◽

Application Fields

In this review, we present an analysis of the most used multi-agent reinforcement learning algorithms. Starting with the single-agent reinforcement learning algorithms, we focus on the most critical issues that must be taken into account in their extension to multi-agent scenarios. The analyzed algorithms were grouped according to their features. We present a detailed taxonomy of the main multi-agent approaches proposed in the literature, focusing on their related mathematical models. For each algorithm, we describe the possible application fields, while pointing out its pros and cons. The described multi-agent algorithms are compared in terms of the most important characteristics for multi-agent reinforcement learning applications—namely, nonstationarity, scalability, and observability. We also describe the most common benchmark environments used to evaluate the performances of the considered methods.

Download Full-text