A Case Study on Air Combat Decision Using Approximated Dynamic Programming

Mathematical Problems in Engineering ◽

10.1155/2014/183401 ◽

2014 ◽

Vol 2014 ◽

pp. 1-10 ◽

Cited By ~ 8

Author(s):

Yaofei Ma ◽

Xiaole Ma ◽

Xiao Song

Keyword(s):

Dynamic Programming ◽

State Space ◽

High Performance ◽

High Efficiency ◽

Policy Improvement ◽

Space Problem ◽

Reward Function ◽

Continuous State ◽

Air Combat ◽

Continuous Reward

As a continuous state space problem, air combat is difficult to be resolved by traditional dynamic programming (DP) with discretized state space. The approximated dynamic programming (ADP) approach is studied in this paper to build a high performance decision model for air combat in 1 versus 1 scenario, in which the iterative process for policy improvement is replaced by mass sampling from history trajectories and utility function approximating, leading to high efficiency on policy improvement eventually. A continuous reward function is also constructed to better guide the plane to find its way to “winner” state from any initial situation. According to our experiments, the plane is more offensive when following policy derived from ADP approach other than the baseline Min-Max policy, in which the “time to win” is reduced greatly but the cumulated probability of being killed by enemy is higher. The reason is analyzed in this paper.

Download Full-text

Hyperspace Neighbor Penetration Approach to Dynamic Programming for Model-Based Reinforcement Learning Problems with Slowly Changing Variables in a Continuous State Space

10.1109/iccma53594.2021.00018 ◽

2021 ◽

Author(s):

Vincent Zha ◽

Ivey Chiu

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

State Space ◽

Learning Problems ◽

Model Based ◽

Continuous State Space ◽

Continuous State

Download Full-text

Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/429 ◽

2021 ◽

Author(s):

Xiong Wang ◽

Riheng Jia

Keyword(s):

Contraction Mapping ◽

Mean Field ◽

Markov Analysis ◽

Discrete State ◽

Reward Function ◽

Continuous State ◽

State Evolution ◽

Continuous Reward ◽

Multi Agent ◽

The Mean

Mean field game facilitates analyzing multi-armed bandit (MAB) for a large number of agents by approximating their interactions with an average effect. Existing mean field models for multi-agent MAB mostly assume a binary reward function, which leads to tractable analysis but is usually not applicable in practical scenarios. In this paper, we study the mean field bandit game with a continuous reward function. Specifically, we focus on deriving the existence and uniqueness of mean field equilibrium (MFE), thereby guaranteeing the asymptotic stability of the multi-agent system. To accommodate the continuous reward function, we encode the learned reward into an agent state, which is in turn mapped to its stochastic arm playing policy and updated using realized observations. We show that the state evolution is upper semi-continuous, based on which the existence of MFE is obtained. As the Markov analysis is mainly for the case of discrete state, we transform the stochastic continuous state evolution into a deterministic ordinary differential equation (ODE). On this basis, we can characterize a contraction mapping for the ODE to ensure a unique MFE for the bandit game. Extensive evaluations validate our MFE characterization, and exhibit tight empirical regret of the MAB problem.

Download Full-text

A Study of Continuous Maximum Entropy Deep Inverse Reinforcement Learning

Mathematical Problems in Engineering ◽

10.1155/2019/4834516 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8

Author(s):

Xi-liang Chen ◽

Lei Cao ◽

Zhi-xiong Xu ◽

Jun Lai ◽

Chen-xi Li

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Maximum Entropy ◽

Learning Algorithm ◽

Action Space ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Continuous State Space ◽

Hot Start ◽

Continuous State

The assumption of IRL is that demonstrations are optimally acting in an environment. In the past, most of the work on IRL needed to calculate optimal policies for different reward functions. However, this requirement is difficult to satisfy in large or continuous state space tasks. Let alone continuous action space. We propose a continuous maximum entropy deep inverse reinforcement learning algorithm for continuous state space and continues action space, which realizes the depth cognition of the environment model by the way of reconstructing the reward function based on the demonstrations, and a hot start mechanism based on demonstrations to make the training process faster and better. We compare this new approach to well-known IRL algorithms using Maximum Entropy IRL, DDPG, hot start DDPG, etc. Empirical results on classical control environments on OpenAI Gym: MountainCarContinues-v0 show that our approach is able to learn policies faster and better.

Download Full-text

Hierarchical fuzzy ART for Q-learning and its application in air combat simulation

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962317500520 ◽

2017 ◽

Vol 08 (04) ◽

pp. 1750052 ◽

Cited By ~ 1

Author(s):

Yanan Zhou ◽

Yaofei Ma ◽

Xiao Song ◽

Guanghong Gong

Keyword(s):

State Space ◽

Function Approximation ◽

Value Function ◽

Early Stage ◽

Network Parameter ◽

Resonance Theory ◽

Value Function Approximation ◽

Fuzzy Art ◽

Continuous State ◽

Air Combat

Value function approximation plays an important role in reinforcement learning (RL) with continuous state space, which is widely used to build decision models in practice. Many traditional approaches require experienced designers to manually specify the formulization of the approximating function, leading to the rigid, non-adaptive representation of the value function. To address this problem, a novel Q-value function approximation method named ‘Hierarchical fuzzy Adaptive Resonance Theory’ (HiART) is proposed in this paper. HiART is based on the Fuzzy ART method and is an adaptive classification network that learns to segment the state space by classifying the training input automatically. HiART begins with a highly generalized structure where the number of the category nodes is limited, which is beneficial to speed up the learning process at the early stage. Then, the network is refined gradually by creating the attached sub-networks, and a layered network structure is formed during this process. Based on this adaptive structure, HiART alleviates the dependence on expert experience to design the network parameter. The effectiveness and adaptivity of HiART are demonstrated in the Mountain Car benchmark problem with both fast learning speed and low computation time. Finally, a simulation application example of the one versus one air combat decision problem illustrates the applicability of HiART.

Download Full-text

High-Performance Hardware Interpolation Architecture for High Efficiency Video Coding Decoder

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v11i9.9753 ◽

2016 ◽

Vol 11 (9) ◽

pp. 764

Author(s):

Lella Aicha Ayadi ◽

Nihel Neji ◽

Hassen Loukil ◽

Mouhamed Ali Ben Ayed ◽

Nouri Masmoudi

Keyword(s):

Video Coding ◽

High Performance ◽

High Efficiency ◽

High Efficiency Video Coding

Download Full-text

Investigation on the La Replacement and Little Additive Modification of High-Performance Permanent Magnetic Strontium-Ferrite

Processes ◽

10.3390/pr9061034 ◽

2021 ◽

Vol 9 (6) ◽

pp. 1034

Author(s):

Ching-Chien Huang ◽

Chin-Chieh Mo ◽

Guan-Ming Chen ◽

Hsiao-Hsuan Hsu ◽

Guo-Jiun Shu

Keyword(s):

High Performance ◽

Single Phase ◽

Permanent Magnets ◽

High Efficiency ◽

Cobalt Content ◽

Preparation Condition ◽

Milling Process ◽

Experimental Parameters ◽

La Substitution ◽

Hard Magnets

In this work, an experiment was carried out to investigate the preparation condition of anisotropic, Fe-deficient, M-type Sr ferrite with optimum magnetic and physical properties by changing experimental parameters, such as the La substitution amount and little additive modification during fine milling process. The compositions of the calcined ferrites were chosen according to the stoichiometry LaxSr1-xFe12-2xO19, where M-type single-phase calcined powder was synthesized with a composition of x = 0.30. The effect of CaCO3, SiO2, and Co3O4 inter-additives on the Sr ferrite was also discussed in order to obtain low-temperature sintered magnets. The magnetic properties of Br = 4608 Gauss, bHc = 3650 Oe, iHc = 3765 Oe, and (BH)max = 5.23 MGOe were obtained for Sr ferrite hard magnets with low cobalt content at 1.7 wt%, which will eventually be used as high-end permanent magnets for the high-efficiency motor application in automobiles with Br > 4600 ± 50 G and iHc > 3600 ± 50 Oe.

Download Full-text

New hole transport styrene polymers bearing highly π-extended conjugated side-chain moieties for high-performance solution-processable thermally activated delayed fluorescence OLEDs

Polymer Chemistry ◽

10.1039/d1py00026h ◽

2021 ◽

Vol 12 (11) ◽

pp. 1692-1699

Author(s):

Ji Hye Lee ◽

Jinhyo Hwang ◽

Chai Won Kim ◽

Amit Kumar Harit ◽

Han Young Woo ◽

...

Keyword(s):

High Performance ◽

High Efficiency ◽

Delayed Fluorescence ◽

Hole Transport ◽

Side Chain ◽

Solution Processed ◽

Thermally Activated Delayed Fluorescence ◽

Thermally Activated ◽

Turn On ◽

Solution Processable

New polystyrene-based polymers with high π-extended hole transport pendants were synthesized to obtain a low turn-on voltage and high efficiency in solution-processed green TADF-OLEDs.

Download Full-text

Neighborhood Energy Modeling and Monitoring: A Case Study

Energies ◽

10.3390/en14123716 ◽

2021 ◽

Vol 14 (12) ◽

pp. 3716

Author(s):

Francesco Causone ◽

Rossano Scoccia ◽

Martina Pelle ◽

Paola Colombo ◽

Mario Motta ◽

...

Keyword(s):

High Performance ◽

High Efficiency ◽

Early Stage ◽

Energy Performance ◽

Monitoring Plan ◽

Performance Targets ◽

Carrier Energy ◽

Zero Carbon ◽

Energy Grid

Cities and nations worldwide are pledging to energy and carbon neutral objectives that imply a huge contribution from buildings. High-performance targets, either zero energy or zero carbon, are typically difficult to be reached by single buildings, but groups of properly-managed buildings might reach these ambitious goals. For this purpose we need tools and experiences to model, monitor, manage and optimize buildings and their neighborhood-level systems. The paper describes the activities pursued for the deployment of an advanced energy management system for a multi-carrier energy grid of an existing neighborhood in the area of Milan. The activities included: (i) development of a detailed monitoring plan, (ii) deployment of the monitoring plan, (iii) development of a virtual model of the neighborhood and simulation of the energy performance. Comparisons against early-stage energy monitoring data proved promising and the generation system showed high efficiency (EER equal to 5.84), to be further exploited.

Download Full-text

Self-templating synthesis of prismatic-like N-doped carbon tubes embedded with Fe3O4 as a high-efficiency polysulfide-anchoring-conversion mediator for high performance lithium-sulfur batteries

Chemical Engineering Journal ◽

10.1016/j.cej.2020.128153 ◽

2021 ◽

Vol 410 ◽

pp. 128153

Author(s):

Shasha Xin ◽

Jing Li ◽

Hongtao Cui ◽

Yuanyuan Liu ◽

Huiying Wei ◽

...

Keyword(s):

High Performance ◽

High Efficiency ◽

Lithium Sulfur Batteries ◽

Templating Synthesis ◽

Lithium Sulfur

Download Full-text

Iodine reduction for reproducible and high-performance perovskite solar cells and modules

Science Advances ◽

10.1126/sciadv.abe8130 ◽

2021 ◽

Vol 7 (10) ◽

pp. eabe8130

Author(s):

Shangshang Chen ◽

Xun Xiao ◽

Hangyu Gu ◽

Jinsong Huang

Keyword(s):

Solar Cells ◽

High Performance ◽

High Efficiency ◽

Electronic Materials ◽

Substantial Reduction ◽

Perovskite Solar Cells ◽

High Yield ◽

Operation Conditions ◽

Record Value ◽

Precursor Powders

Perovskite-based electronic materials and devices such as perovskite solar cells (PSCs) have notoriously bad reproducibility, which greatly impedes both fundamental understanding of their intrinsic properties and real-world applications. Here, we report that organic iodide perovskite precursors can be oxidized to I2 even for carefully sealed precursor powders or solutions, which markedly deteriorates the performance and reproducibility of PSCs. Adding benzylhydrazine hydrochloride (BHC) as a reductant into degraded precursor solutions can effectively reduce the detrimental I2 back to I−, accompanied by a substantial reduction of I3−-induced charge traps in the films. BHC residuals in perovskite films further stabilize the PSCs under operation conditions. BHC improves the stabilized efficiency of the blade-coated p-i-n structure PSCs to a record value of 23.2% (22.62 ± 0.40% certified by National Renewable Energy Laboratory), and the high-efficiency devices have a very high yield. A stabilized aperture efficiency of 18.2% is also achieved on a 35.8-cm2 mini-module.

Download Full-text