scholarly journals An Intelligent TCP Congestion Control Method Based on Deep Q Network

2021 ◽  
Vol 13 (10) ◽  
pp. 261
Author(s):  
Yinfeng Wang ◽  
Longxiang Wang ◽  
Xiaoshe Dong

To optimize the data migration performance between different supercomputing centers in China, we present TCP-DQN, which is an intelligent TCP congestion control method based on DQN (Deep Q network). The TCP congestion control process is abstracted as a partially observed Markov decision process. In this process, an agent is constructed to interact with the network environment. The agent adjusts the size of the congestion window by observing the characteristics of the network state. The network environment feeds back the reward to the agent, and the agent tries to maximize the expected reward in an episode. We designed a weighted reward function to balance the throughput and delay. Compared with traditional Q-learning, DQN uses double-layer neural networks and experience replay to reduce the oscillation problem that may occur in gradient descent. We implemented the TCP-DQN method and compared it with mainstream congestion control algorithms such as cubic, Highspeed and NewReno. The results show that the throughput of TCP-DQN can reach more than 2 times of the comparison method while the latency is close to the three compared methods.

2019 ◽  
Vol 14 ◽  
Author(s):  
Tayyab Khan ◽  
Karan Singh ◽  
Kamlesh C. Purohit

Background: With the growing popularity of various group communication applications such as file transfer, multimedia events, distance learning, email distribution, multiparty video conferencing and teleconferencing, multicasting seems to be a useful tool for efficient multipoint data distribution. An efficient communication technique depends on the various parameters like processing speed, buffer storage, and amount of data flow between the nodes. If data exceeds beyond the capacity of a link or node, then it introduces congestion in the network. A series of multicast congestion control algorithms have been developed, but due to the heterogeneous network environment, these approaches do not respond nor reduce congestion quickly whenever network behavior changes. Objective: Multicasting is a robust and efficient one-to-many (1: M) group transmission (communication) technique to reduced communication cost, bandwidth consumption, processing time and delays with similar reliability (dependability) as of regular unicast. This patent presents a novel and comprehensive congestion control method known as integrated multicast congestion control approach (ICMA) to reduce packet loss. Methods: The proposed mechanism is based on leave-join and flow control mechanism along with proportional integrated and derivate (PID) controller to reduce packet loss, depending on the congestion status. In the proposed approach, Proportional integrated and derivate controller computes expected incoming rate at each router and feedback this rate to upstream routers of the multicast network to stabilize their local buffer occupancy. Results: Simulation results on NS-2 exhibit the immense performance of the proposed approach in terms of delay, throughput, bandwidth utilization, and packet loss than other existing methods. Conclusion: The proposed congestion control scheme provides better bandwidth utilization and throughput than other existing approaches. Moreover, we have discussed existing congestion control schemes with their research gaps. In the future, we are planning to explore the fairness and quality of service issue in multicast communication.


2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


Sign in / Sign up

Export Citation Format

Share Document