scholarly journals Deep-Reinforcement Learning-Based Co-Evolution in a Predator–Prey System

Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 773 ◽  
Author(s):  
Xueting Wang ◽  
Jun Cheng ◽  
Lei Wang

Understanding or estimating the co-evolution processes is critical in ecology, but very challenging. Traditional methods are difficult to deal with the complex processes of evolution and to predict their consequences on nature. In this paper, we use the deep-reinforcement learning algorithms to endow the organism with learning ability, and simulate their evolution process by using the Monte Carlo simulation algorithm in a large-scale ecosystem. The combination of the two algorithms allows organisms to use experiences to determine their behavior through interaction with that environment, and to pass on experience to their offspring. Our research showed that the predators’ reinforcement learning ability contributed to the stability of the ecosystem and helped predators obtain a more reasonable behavior pattern of coexistence with its prey. The reinforcement learning effect of prey on its own population was not as good as that of predators and increased the risk of extinction of predators. The inconsistent learning periods and speed of prey and predators aggravated that risk. The co-evolution of the two species had resulted in fewer numbers of their populations due to their potentially antagonistic evolutionary networks. If the learnable predators and prey invade an ecosystem at the same time, prey had an advantage. Thus, the proposed model illustrates the influence of learning mechanism on a predator–prey ecosystem and demonstrates the feasibility of predicting the behavior evolution in a predator–prey ecosystem using AI approaches.

2018 ◽  
Vol 30 (7) ◽  
pp. 1983-2004 ◽  
Author(s):  
Yazhou Hu ◽  
Bailu Si

We propose a neural network model for reinforcement learning to control a robotic manipulator with unknown parameters and dead zones. The model is composed of three networks. The state of the robotic manipulator is predicted by the state network of the model, the action policy is learned by the action network, and the performance index of the action policy is estimated by a critic network. The three networks work together to optimize the performance index based on the reinforcement learning control scheme. The convergence of the learning methods is analyzed. Application of the proposed model on a simulated two-link robotic manipulator demonstrates the effectiveness and the stability of the model.


2021 ◽  
Author(s):  
Tiantian Zhang ◽  
Xueqian Wang ◽  
Bin Liang ◽  
Bo Yuan

The powerful learning ability of deep neural networks enables reinforcement learning (RL) agents to learn competent control policies directly from high-dimensional and continuous environments. In theory, to achieve stable performance, neural networks assume i.i.d. inputs, which unfortunately does no hold in the general RL paradigm where the training data is temporally correlated and non-stationary. This issue may lead to the phenomenon of "catastrophic interference" (a.k.a. "catastrophic forgetting") and the collapse in performance as later training is likely to overwrite and interfer with previously learned good policies. In this paper, we introduce the concept of "context" into the single-task RL and develop a novel scheme, termed as Context Division and Knowledge Distillation (CDaKD) driven RL, to divide all states experienced during training into a series of contexts. Its motivation is to mitigate the challenge of aforementioned catastrophic interference in deep RL, thereby improving the stability and plasticity of RL models. At the heart of CDaKD is a value function, parameterized by a neural network feature extractor shared across all contexts, and a set of output heads, each specializing on an individual context. In CDaKD, we exploit online clustering to achieve context division, and interference is further alleviated by a knowledge distillation regularization term on the output layers for learned contexts. In addition, to effectively obtain the context division in high-dimensional state spaces (e.g., image inputs), we perform clustering in the lower-dimensional representation space of a randomly initialized convolutional encoder, which is fixed throughout training. Our results show that, with various replay memory capacities, CDaKD can consistently improve the performance of existing RL algorithms on classic OpenAI Gym tasks and the more complex high-dimensional Atari tasks, incurring only moderate computational overhead.


2015 ◽  
Vol 2015 ◽  
pp. 1-12
Author(s):  
Xubin Gao ◽  
Qiuhui Pan ◽  
Mingfeng He

This paper discusses the impact on human health caused by the addition of antibiotics in the feed of food animals. We use the established transmission rule of resistant bacteria and combine it with a predator-prey system to determine a differential equations model. The equations have three steady equilibrium points corresponding to three population dynamics states under the influence of resistant bacteria. In order to quantitatively analyze the stability of the equilibrium points, we focused on the basic reproduction numbers. Then, both the local and global stability of the equilibrium points were quantitatively analyzed by using essential mathematical methods. Numerical results are provided to relate our model properties to some interesting biological cases. Finally, we discuss the effect of the two main parameters of the model, the proportion of antibiotics added to feed and the predation rate, and estimate the human health impacts related to the amount of feed antibiotics used. We further propose an approach for the prevention of the large-scale spread of resistant bacteria and illustrate the necessity of controlling the amount of in-feed antibiotics used.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Feng Ding ◽  
Guanfeng Ma ◽  
Zhikui Chen ◽  
Jing Gao ◽  
Peng Li

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Zhenghua Zhang ◽  
Jin Qian ◽  
Chongxin Fang ◽  
Guoshu Liu ◽  
Quan Su

In the adaptive traffic signal control (ATSC), reinforcement learning (RL) is a frontier research hotspot, combined with deep neural networks to further enhance its learning ability. The distributed multiagent RL (MARL) can avoid this kind of problem by observing some areas of each local RL in the complex plane traffic area. However, due to the limited communication capabilities between each agent, the environment becomes partially visible. This paper proposes multiagent reinforcement learning based on cooperative game (CG-MARL) to design the intersection as an agent structure. The method considers not only the communication and coordination between agents but also the game between agents. Each agent observes its own area to learn the RL strategy and value function, then concentrates the Q function from different agents through a hybrid network, and finally forms its own final Q function in the entire large-scale transportation network. The results show that the proposed method is superior to the traditional control method.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-15 ◽  
Author(s):  
Jian Sun ◽  
Jie Li

The large scale, time varying, and diversification of physically coupled networked infrastructures such as power grid and transportation system lead to the complexity of their controller design, implementation, and expansion. For tackling these challenges, we suggest an online distributed reinforcement learning control algorithm with the one-layer neural network for each subsystem or called agents to adapt the variation of the networked infrastructures. Each controller includes a critic network and action network for approximating strategy utility function and desired control law, respectively. For avoiding a large number of trials and improving the stability, the training of action network introduces supervised learning mechanisms into reduction of long-term cost. The stability of the control system with learning algorithm is analyzed; the upper bound of the tracking error and neural network weights are also estimated. The effectiveness of our proposed controller is illustrated in the simulation; the results indicate the stability under communication delay and disturbances as well.


Author(s):  
Hironobu Sone ◽  
Yoshinobu Tamura ◽  
Shigeru Yamada

Recently, open source software (OSS) are adopted various situations because of quick delivery, cost reduction and standardization of systems. Many OSS are developed under the peculiar development style known as bazaar method. According to this method, faults are detected and fixed by users and developers around the world, and the fixed result will be reflected in the next release. Also, the fix time of faults tends to be shorter as the development of OSS progresses. However, several large-scale open source projects have a problem that faults fixing takes a lot of time because faults corrector cannot handle many faults reports quickly. Furthermore, imperfect fault fixing sometimes occurs because the fault fixing is performed by various people and environments. Therefore, OSS users and project managers need to know the stability degree of open source projects by grasping the fault fixing time. In this paper, for assessment stability of large-scale open source project, we derive the imperfect fault fixing probability and the transition probability distribution. For derivation, we use the software reliability growth model based on the Wiener process considering that the fault fixing time in open source projects changes depending on various factors such as the fault reporting time and the assignees for fixing faults. In addition, we applied the proposed model to actual open source project data and examined the validity of the model.


2020 ◽  
Vol 2020 ◽  
pp. 1-10
Author(s):  
Hangxing Ding ◽  
Song Chen ◽  
Shuai Chang ◽  
Guanghui Li ◽  
Lei Zhou

Underground caving can potentially lead to large-scale surface destruction. To test the safety conditions of the surface construction projects near the circular surface subsidence zone in the Hemushan Iron Mine, this paper proposes an analytical model to analyze the stability of the cylindrical caved space by employing the long-term strength of the surrounding rock mass, the in situ stress, and the impact of caved materials as inputs. The proposed model is valid for predicting the orientation and depth where rock failure occurs and for calculating the maximum depth of the undercut, above which the surrounding rock mass of the caved space can remain stable for a long duration of time. The prediction for the Hemushan Iron Mine from the proposed model reveals that the construction projects can maintain safe working conditions, and such prediction is also demonstrated by the records from Google Earth satellite images. This means that the proposed model is valid for conducting such analysis. Additionally, to prevent rock failure above the free surface of caved materials, backfilling the subsidence zone with waste rocks is suggested, and such a measure is implemented in the Hemushan Iron Mine. The monitoring results show that this measure contributes to protecting the surrounding wall of the caved space from large-scale slip failure. The contribution of this work not only provides a robust analytical model for predicting the stability of rock around a cylindrical caved space but also introduces employable measures for mitigating the subsequent extension of surface subsidence after vertical caving.


2016 ◽  
Vol 2016 ◽  
pp. 1-15 ◽  
Author(s):  
Manoj Kumar Singh ◽  
B. S. Bhadauria ◽  
Brajesh Kumar Singh

This paper deals with the study of the stability and the bifurcation analysis of a Leslie-Gower predator-prey model with Michaelis-Menten type predator harvesting. It is shown that the proposed model exhibits the bistability for certain parametric conditions. Dulac’s criterion has been adopted to obtain the sufficient conditions for the global stability of the model. Moreover, the model exhibits different kinds of bifurcations (e.g., the saddle-node bifurcation, the subcritical and supercritical Hopf bifurcations, Bogdanov-Takens bifurcation, and the homoclinic bifurcation) whenever the values of parameters of the model vary. The analytical findings and numerical simulations reveal far richer and complex dynamics in comparison to the models with no harvesting and with constant-yield predator harvesting.


Sign in / Sign up

Export Citation Format

Share Document