Micro-LED backlight module by deep reinforcement learning and micro-macro-hybrid environment control agent

2021 ◽  
Author(s):  
Che-Hsuan Huang ◽  
Yu-Tang Cheng ◽  
Yung-Chi Tsao ◽  
xinke liu ◽  
Hao-chung Kuo
2021 ◽  
Vol 11 (7) ◽  
pp. 3257
Author(s):  
Chen-Huan Pi ◽  
Wei-Yuan Ye ◽  
Stone Cheng

In this paper, a novel control strategy is presented for reinforcement learning with disturbance compensation to solve the problem of quadrotor positioning under external disturbance. The proposed control scheme applies a trained neural-network-based reinforcement learning agent to control the quadrotor, and its output is directly mapped to four actuators in an end-to-end manner. The proposed control scheme constructs a disturbance observer to estimate the external forces exerted on the three axes of the quadrotor, such as wind gusts in an outdoor environment. By introducing an interference compensator into the neural network control agent, the tracking accuracy and robustness were significantly increased in indoor and outdoor experiments. The experimental results indicate that the proposed control strategy is highly robust to external disturbances. In the experiments, compensation improved control accuracy and reduced positioning error by 75%. To the best of our knowledge, this study is the first to achieve quadrotor positioning control through low-level reinforcement learning by using a global positioning system in an outdoor environment.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Zhipeng Li ◽  
Xiumei Wei ◽  
Xuesong Jiang ◽  
Yewen Pang

It is difficult to coordinate the various processes in the process industry. We built a multiagent distributed hierarchical intelligent control model for manufacturing systems integrating multiple production units based on multiagent system technology. The model organically combines multiple intelligent agent modules and physical entities to form an intelligent control system with certain functions. The model consists of system management agent, workshop control agent, and equipment agent. For the task assignment problem with this model, we combine reinforcement learning to improve the genetic algorithm for multiagent task scheduling and use the standard task scheduling dataset in OR-Library for simulation experiment analysis. Experimental results show that the algorithm is superior.


2013 ◽  
Vol 2013 ◽  
pp. 1-10 ◽  
Author(s):  
Lun-Hui Xu ◽  
Xin-Hai Xia ◽  
Qiang Luo

Urban traffic self-adaptive control problem is dynamic and uncertain, so the states of traffic environment are hard to be observed. Efficient agent which controls a single intersection can be discovered automatically via multiagent reinforcement learning. However, in the majority of the previous works on this approach, each agent needed perfect observed information when interacting with the environment and learned individually with less efficient coordination. This study casts traffic self-adaptive control as a multiagent Markov game problem. The design employs traffic signal control agent (TSCA) for each signalized intersection that coordinates with neighboring TSCAs. A mathematical model for TSCAs’ interaction is built based on nonzero-sum markov game which has been applied to let TSCAs learn how to cooperate. A multiagent Markov game reinforcement learning approach is constructed on the basis of single-agentQ-learning. This method lets each TSCA learn to update itsQ-values under the joint actions and imperfect information. The convergence of the proposed algorithm is analyzed theoretically. The simulation results show that the proposed method is convergent and effective in realistic traffic self-adaptive control setting.


2005 ◽  
Vol 17 (2) ◽  
pp. 335-359 ◽  
Author(s):  
Jun Morimoto ◽  
Kenji Doya

This letter proposes a new reinforcement learning (RL) paradigm that explicitly takes into account input disturbance as well as modeling errors. The use of environmental models in RL is quite popular for both off-line learning using simulations and for online action planning. However, the difference between the model and the real environment can lead to unpredictable, and often unwanted, results. Based on the theory of H∞ control, we consider a differential game in which a “disturbing” agent tries to make the worst possible disturbance while a “control” agent tries to make the best control input. The problem is formulated as finding a min-max solution of a value function that takes into account the amount of the reward and the norm of the disturbance. We derive online learning algorithms for estimating the value function and for calculating the worst disturbance and the best control in reference to the value function. We tested the paradigm, which we call robust reinforcement learning (RRL), on the control task of an inverted pendulum. In the linear domain, the policy and the value function learned by online algorithms coincided with those derived analytically by the linear H∞ control theory. For a fully nonlinear swing-up task, RRL achieved robust performance with changes in the pendulum weight and friction, while a standard reinforcement learning algorithm could not deal with these changes. We also applied RRL to the cart-pole swing-up task, and a robust swing-up policy was acquired.


Electronics ◽  
2020 ◽  
Vol 9 (6) ◽  
pp. 996
Author(s):  
Wooseok Song ◽  
Woong Hyun Suh ◽  
Chang Wook Ahn

This paper proposes a DRL -based training method for spellcaster units in StarCraft II, one of the most representative Real-Time Strategy (RTS) games. During combat situations in StarCraft II, micro-controlling various combat units is crucial in order to win the game. Among many other combat units, the spellcaster unit is one of the most significant components that greatly influences the combat results. Despite the importance of the spellcaster units in combat, training methods to carefully control spellcasters have not been thoroughly considered in related studies due to the complexity. Therefore, we suggest a training method for spellcaster units in StarCraft II by using the A3C algorithm. The main idea is to train two Protoss spellcaster units under three newly designed minigames, each representing a unique spell usage scenario, to use ‘Force Field’ and ‘Psionic Storm’ effectively. As a result, the trained agents show winning rates of more than 85% in each scenario. We present a new training method for spellcaster units that releases the limitation of StarCraft II AI research. We expect that our training method can be used for training other advanced and tactical units by applying transfer learning in more complex minigame scenarios or full game maps.


2016 ◽  
Vol 28 (4) ◽  
pp. 371-381 ◽  
Author(s):  
Chao Lu ◽  
Jie Huang ◽  
Jianwei Gong

Reinforcement Learning (RL) has been proposed to deal with ramp control problems under dynamic traffic conditions; however, there is a lack of sufficient research on the behaviour and impacts of different learning parameters. This paper describes a ramp control agent based on the RL mechanism and thoroughly analyzed the influence of three learning parameters; namely, learning rate, discount rate and action selection parameter on the algorithm performance. Two indices for the learning speed and convergence stability were used to measure the algorithm performance, based on which a series of simulation-based experiments were designed and conducted by using a macroscopic traffic flow model. Simulation results showed that, compared with the discount rate, the learning rate and action selection parameter made more remarkable impacts on the algorithm performance. Based on the analysis, some suggestionsabout how to select suitable parameter values that can achieve a superior performance were provided.


Sign in / Sign up

Export Citation Format

Share Document