scholarly journals Edge Caching for D2D Enabled Hierarchical Wireless Networks with Deep Reinforcement Learning

2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Wenkai Li ◽  
Chenyang Wang ◽  
Ding Li ◽  
Bin Hu ◽  
Xiaofei Wang ◽  
...  

Edge caching is a promising method to deal with the traffic explosion problem towards future network. In order to satisfy the demands of user requests, the contents can be proactively cached locally at the proximity to users (e.g., base stations or user device). Recently, some learning-based edge caching optimizations are discussed. However, most of the previous studies explore the influence of dynamic and constant expanding action and caching space, leading to unpracticality and low efficiency. In this paper, we study the edge caching optimization problem by utilizing the Double Deep Q-network (Double DQN) learning framework to maximize the hit rate of user requests. Firstly, we obtain the Device-to-Device (D2D) sharing model by considering both online and offline factors and then we formulate the optimization problem, which is proved as NP-hard. Then the edge caching replacement problem is derived by Markov decision process (MDP). Finally, an edge caching strategy based on Double DQN is proposed. The experimental results based on large-scale actual traces show the effectiveness of the proposed framework.

Author(s):  
Md Salik Parwez ◽  
Hasan Farooq ◽  
Ali Imran ◽  
Hazem Refai

This paper presents a novel scheme for spectral efficiency (SE) optimization through clustering of users. By clustering users with respect to their geographical concentration we propose a solution for dynamic steering of antenna beam, i.e., antenna azimuth and tilt optimization with respect to the most focal point in a cell that would maximize overall SE in the system. The proposed framework thus introduces the notion of elastic cells that can be potential component of 5G networks. The proposed scheme decomposes large-scale system-wide optimization problem into small-scale local sub-problems and thus provides a low complexity solution for dynamic system wide optimization. Every sub-problem involves clustering of users to determine focal point of the cell for given user distribution in time and space, and determining new values of azimuth and tilt that would optimize the overall system SE performance. To this end, we propose three user clustering algorithms to transform a given user distribution into the focal points that can be used in optimization; the first is based on received signal to interference ratio (SIR) at the user; the second is based on received signal level (RSL) at the user; the third and final one is based on relative distances of users from the base stations. We also formulate and solve an optimization problem to determine optimal radii of clusters. The performances of proposed algorithms are evaluated through system level simulations. Performance comparison against benchmark where no elastic cell deployed, shows that a gain in spectral efficiency of up to 25% is possible depending upon user distribution in a cell.


2010 ◽  
Vol 44-47 ◽  
pp. 3611-3615 ◽  
Author(s):  
Zhi Cong Zhang ◽  
Kai Shun Hu ◽  
Hui Yu Huang ◽  
Shuai Li ◽  
Shao Yong Zhao

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.


Author(s):  
A.V. Edelev ◽  
D.N. Karamov ◽  
I.A. Sidorov ◽  
D.V. Binh ◽  
N.H. Nam ◽  
...  

The paper addresses the research of the large-scale penetration of renewable energy into the power system of Vietnam. The proposed approach presents the optimization of operational decisions in different power generation technologies as a Markov decision process. It uses a stochastic base model that optimizes a deterministic lookahead model. The first model applies the stochastic search to optimize the operation of power sources. The second model captures hourly variations of renewable energy over a year. The approach helps to find the optimal generation configuration under different market conditions.


2019 ◽  
Vol 2019 ◽  
pp. 1-14 ◽  
Author(s):  
Sambuddha Ghosal ◽  
Bangyou Zheng ◽  
Scott C. Chapman ◽  
Andries B. Potgieter ◽  
David R. Jordan ◽  
...  

The yield of cereal crops such as sorghum (Sorghum bicolor L. Moench) depends on the distribution of crop-heads in varying branching arrangements. Therefore, counting the head number per unit area is critical for plant breeders to correlate with the genotypic variation in a specific breeding field. However, measuring such phenotypic traits manually is an extremely labor-intensive process and suffers from low efficiency and human errors. Moreover, the process is almost infeasible for large-scale breeding plantations or experiments. Machine learning-based approaches like deep convolutional neural network (CNN) based object detectors are promising tools for efficient object detection and counting. However, a significant limitation of such deep learning-based approaches is that they typically require a massive amount of hand-labeled images for training, which is still a tedious process. Here, we propose an active learning inspired weakly supervised deep learning framework for sorghum head detection and counting from UAV-based images. We demonstrate that it is possible to significantly reduce human labeling effort without compromising final model performance (R2 between human count and machine count is 0.88) by using a semitrained CNN model (i.e., trained with limited labeled data) to perform synthetic annotation. In addition, we also visualize key features that the network learns. This improves trustworthiness by enabling users to better understand and trust the decisions that the trained deep learning model makes.


2013 ◽  
Vol 30 (05) ◽  
pp. 1350014 ◽  
Author(s):  
ZHICONG ZHANG ◽  
WEIPING WANG ◽  
SHOUYAN ZHONG ◽  
KAISHUN HU

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.


2021 ◽  
Vol 12 (1) ◽  
pp. 272
Author(s):  
Bumjin Park ◽  
Cheongwoong Kang ◽  
Jaesik Choi

This paper deals with the concept of multi-robot task allocation, referring to the assignment of multiple robots to tasks such that an objective function is maximized. The performance of existing meta-heuristic methods worsens as the number of robots or tasks increases. To tackle this problem, a novel Markov decision process formulation for multi-robot task allocation is presented for reinforcement learning. The proposed formulation sequentially allocates robots to tasks to minimize the total time taken to complete them. Additionally, we propose a deep reinforcement learning method to find the best allocation schedule for each problem. Our method adopts the cross-attention mechanism to compute the preference of robots to tasks. The experimental results show that the proposed method finds better solutions than meta-heuristic methods, especially when solving large-scale allocation problems.


2020 ◽  
Vol 13 (3) ◽  
pp. 93
Author(s):  
Shijun Wang ◽  
Baocheng Zhu ◽  
Chen Li ◽  
Mingzhe Wu ◽  
James Zhang ◽  
...  

In this paper, we propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a non-convex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.


Author(s):  
Hao Liu ◽  
Ting Li ◽  
Renjun Hu ◽  
Yanjie Fu ◽  
Jingjing Gu ◽  
...  

Multi-modal transportation recommendation has a goal of recommending a travel plan which considers various transportation modes, such as walking, cycling, automobile, and public transit, and how to connect among these modes. The successful development of multi-modal transportation recommendation systems can help to satisfy the diversified needs of travelers and improve the efficiency of transport networks. However, existing transport recommender systems mainly focus on unimodal transport planning. To this end, in this paper, we propose a joint representation learning framework for multi-modal transportation recommendation based on a carefully-constructed multi-modal transportation graph. Specifically, we first extract a multi-modal transportation graph from large-scale map query data to describe the concurrency of users, Origin-Destination (OD) pairs, and transport modes. Then, we provide effective solutions for the optimization problem and develop an anchor embedding for transport modes to initialize the embeddings of transport modes. Moreover, we infer user relevance and OD pair relevance, and incorporate them to regularize the representation learning. Finally, we exploit the learned representations for online multimodal transportation recommendations. Indeed, our method has been deployed into one of the largest navigation Apps to serve hundreds of millions of users, and extensive experimental results with real-world map query data demonstrate the enhanced performance of the proposed method for multimodal transportation recommendations.


Sign in / Sign up

Export Citation Format

Share Document