Edge Caching for D2D Enabled Hierarchical Wireless Networks with Deep Reinforcement Learning

Wireless Communications and Mobile Computing ◽

10.1155/2019/2561069 ◽

2019 ◽

Vol 2019 ◽

pp. 1-12 ◽

Cited By ~ 5

Author(s):

Wenkai Li ◽

Chenyang Wang ◽

Ding Li ◽

Bin Hu ◽

Xiaofei Wang ◽

...

Keyword(s):

Decision Process ◽

Large Scale ◽

Optimization Problem ◽

Base Stations ◽

Learning Framework ◽

Markov Decision ◽

Low Efficiency ◽

Replacement Problem ◽

User Device ◽

Edge Caching

Edge caching is a promising method to deal with the traffic explosion problem towards future network. In order to satisfy the demands of user requests, the contents can be proactively cached locally at the proximity to users (e.g., base stations or user device). Recently, some learning-based edge caching optimizations are discussed. However, most of the previous studies explore the influence of dynamic and constant expanding action and caching space, leading to unpracticality and low efficiency. In this paper, we study the edge caching optimization problem by utilizing the Double Deep Q-network (Double DQN) learning framework to maximize the hit rate of user requests. Firstly, we obtain the Device-to-Device (D2D) sharing model by considering both online and offline factors and then we formulate the optimization problem, which is proved as NP-hard. Then the edge caching replacement problem is derived by Markov decision process (MDP). Finally, an edge caching strategy based on Double DQN is proposed. The experimental results based on large-scale actual traces show the effectiveness of the proposed framework.

Download Full-text

Spectral Efficiency Self-Optimization through Dynamic User Clustering and Beam Steering

Advances in Wireless Technologies and Telecommunication - Big Data Applications in the Telecommunications Industry ◽

10.4018/978-1-5225-1750-4.ch010 ◽

2016 ◽

pp. 137-155

Author(s):

Md Salik Parwez ◽

Hasan Farooq ◽

Ali Imran ◽

Hazem Refai

Keyword(s):

Spectral Efficiency ◽

Large Scale ◽

Optimization Problem ◽

Focal Point ◽

Beam Steering ◽

Clustering Algorithms ◽

Performance Comparison ◽

Base Stations ◽

User Clustering ◽

A Cell

This paper presents a novel scheme for spectral efficiency (SE) optimization through clustering of users. By clustering users with respect to their geographical concentration we propose a solution for dynamic steering of antenna beam, i.e., antenna azimuth and tilt optimization with respect to the most focal point in a cell that would maximize overall SE in the system. The proposed framework thus introduces the notion of elastic cells that can be potential component of 5G networks. The proposed scheme decomposes large-scale system-wide optimization problem into small-scale local sub-problems and thus provides a low complexity solution for dynamic system wide optimization. Every sub-problem involves clustering of users to determine focal point of the cell for given user distribution in time and space, and determining new values of azimuth and tilt that would optimize the overall system SE performance. To this end, we propose three user clustering algorithms to transform a given user distribution into the focal points that can be used in optimization; the first is based on received signal to interference ratio (SIR) at the user; the second is based on received signal level (RSL) at the user; the third and final one is based on relative distances of users from the base stations. We also formulate and solve an optimization problem to determine optimal radii of clusters. The performances of proposed algorithms are evaluated through system level simulations. Performance comparison against benchmark where no elastic cell deployed, shows that a gain in spectral efficiency of up to 25% is possible depending upon user distribution in a cell.

Download Full-text

The manufacturing resource optimization problem of federal collaborative development based on Markov Decision Process

Journal of Physics Conference Series ◽

10.1088/1742-6596/1074/1/012182 ◽

2018 ◽

Vol 1074 ◽

pp. 012182

Author(s):

Qiang Qin ◽

Jianxin Yang ◽

Jun Ji ◽

Wenjun Liu ◽

Yi-ming Yang ◽

...

Keyword(s):

Markov Decision Process ◽

Decision Process ◽

Optimization Problem ◽

Resource Optimization ◽

Collaborative Development ◽

Markov Decision ◽

Manufacturing Resource

Download Full-text

A Multi-Step Reinforcement Learning Algorithm

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.3611 ◽

2010 ◽

Vol 44-47 ◽

pp. 3611-3615 ◽

Cited By ~ 1

Author(s):

Zhi Cong Zhang ◽

Kai Shun Hu ◽

Hui Yu Huang ◽

Shuai Li ◽

Shao Yong Zhao

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Learning Algorithm ◽

Machine Learning Method ◽

Learning Method ◽

K Value ◽

Markov Decision ◽

Action Value

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.

Download Full-text

Modelling energy systems of Vietnam with integration of renewable power sources

10.47350/iccs-de.2019.04 ◽

2019 ◽

Cited By ~ 1

Author(s):

A.V. Edelev ◽

D.N. Karamov ◽

I.A. Sidorov ◽

D.V. Binh ◽

N.H. Nam ◽

...

Keyword(s):

Power Generation ◽

Renewable Energy ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Energy Systems ◽

Stochastic Search ◽

Power Sources ◽

Renewable Power ◽

Markov Decision

The paper addresses the research of the large-scale penetration of renewable energy into the power system of Vietnam. The proposed approach presents the optimization of operational decisions in different power generation technologies as a Markov decision process. It uses a stochastic base model that optimizes a deterministic lookahead model. The first model applies the stochastic search to optimize the operation of power sources. The second model captures hourly variations of renewable energy over a year. The approach helps to find the optimal generation configuration under different market conditions.

Download Full-text

A Weakly Supervised Deep Learning Framework for Sorghum Head Detection and Counting

Plant Phenomics ◽

10.34133/2019/1525874 ◽

2019 ◽

Vol 2019 ◽

pp. 1-14 ◽

Cited By ~ 24

Author(s):

Sambuddha Ghosal ◽

Bangyou Zheng ◽

Scott C. Chapman ◽

Andries B. Potgieter ◽

David R. Jordan ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Genotypic Variation ◽

Model Performance ◽

Phenotypic Traits ◽

Learning Framework ◽

Detection And Counting ◽

Head Detection ◽

Low Efficiency ◽

Weakly Supervised

The yield of cereal crops such as sorghum (Sorghum bicolor L. Moench) depends on the distribution of crop-heads in varying branching arrangements. Therefore, counting the head number per unit area is critical for plant breeders to correlate with the genotypic variation in a specific breeding field. However, measuring such phenotypic traits manually is an extremely labor-intensive process and suffers from low efficiency and human errors. Moreover, the process is almost infeasible for large-scale breeding plantations or experiments. Machine learning-based approaches like deep convolutional neural network (CNN) based object detectors are promising tools for efficient object detection and counting. However, a significant limitation of such deep learning-based approaches is that they typically require a massive amount of hand-labeled images for training, which is still a tedious process. Here, we propose an active learning inspired weakly supervised deep learning framework for sorghum head detection and counting from UAV-based images. We demonstrate that it is possible to significantly reduce human labeling effort without compromising final model performance (R2 between human count and machine count is 0.88) by using a semitrained CNN model (i.e., trained with limited labeled data) to perform synthetic annotation. In addition, we also visualize key features that the network learns. This improves trustworthiness by enabling users to better understand and trust the decisions that the trained deep learning model makes.

Download Full-text

FLOW SHOP SCHEDULING WITH REINFORCEMENT LEARNING

Asia Pacific Journal of Operational Research ◽

10.1142/s0217595913500140 ◽

2013 ◽

Vol 30 (05) ◽

pp. 1350014 ◽

Cited By ~ 2

Author(s):

ZHICONG ZHANG ◽

WEIPING WANG ◽

SHOUYAN ZHONG ◽

KAISHUN HU

Keyword(s):

Reinforcement Learning ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

Flow Shop ◽

Flow Shop Scheduling ◽

Scheduling Problems ◽

Shop Scheduling ◽

Reward Function ◽

Markov Decision

Reinforcement learning (RL) is a state or action value based machine learning method which solves large-scale multi-stage decision problems such as Markov Decision Process (MDP) and Semi-Markov Decision Process (SMDP) problems. We minimize the makespan of flow shop scheduling problems with an RL algorithm. We convert flow shop scheduling problems into SMDPs by constructing elaborate state features, actions and the reward function. Minimizing the accumulated reward is equivalent to minimizing the schedule objective function. We apply on-line TD(λ) algorithm with linear gradient-descent function approximation to solve the SMDPs. To examine the performance of the proposed RL algorithm, computational experiments are conducted on benchmarking problems in comparison with other scheduling methods. The experimental results support the efficiency of the proposed algorithm and illustrate that the RL approach is a promising computational approach for flow shop scheduling problems worthy of further investigation.

Download Full-text

Large scale system management based on Markov Decision Process and Big Data Concept

2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT) ◽

10.1109/icaict.2016.7991829 ◽

2016 ◽

Cited By ~ 1

Author(s):

Sagit Valeev ◽

Natalya Kondratyeva

Keyword(s):

Big Data ◽

Markov Decision Process ◽

Decision Process ◽

Large Scale ◽

System Management ◽

Large Scale System ◽

Markov Decision

Download Full-text

Cooperative Multi-Robot Task Allocation with Reinforcement Learning

Applied Sciences ◽

10.3390/app12010272 ◽

2021 ◽

Vol 12 (1) ◽

pp. 272

Author(s):

Bumjin Park ◽

Cheongwoong Kang ◽

Jaesik Choi

Keyword(s):

Reinforcement Learning ◽

Task Allocation ◽

Decision Process ◽

Large Scale ◽

Heuristic Methods ◽

Learning Method ◽

Multiple Robots ◽

Markov Decision ◽

Robot Task ◽

Multi Robot

This paper deals with the concept of multi-robot task allocation, referring to the assignment of multiple robots to tasks such that an objective function is maximized. The performance of existing meta-heuristic methods worsens as the number of robots or tasks increases. To tackle this problem, a novel Markov decision process formulation for multi-robot task allocation is presented for reinforcement learning. The proposed formulation sequentially allocates robots to tasks to minimize the total time taken to complete them. Additionally, we propose a deep reinforcement learning method to find the best allocation schedule for each problem. Our method adopts the cross-attention mechanism to compute the preference of robots to tasks. The experimental results show that the proposed method finds better solutions than meta-heuristic methods, especially when solving large-scale allocation problems.

Download Full-text

Riemannian Proximal Policy Optimization

Computer and Information Science ◽

10.5539/cis.v13n3p93 ◽

2020 ◽

Vol 13 (3) ◽

pp. 93

Author(s):

Shijun Wang ◽

Baocheng Zhu ◽

Chen Li ◽

Mingzhe Wu ◽

James Zhang ◽

...

Keyword(s):

Optimization Algorithm ◽

Decision Process ◽

Optimization Problem ◽

Positive Semidefinite ◽

Gaussian Mixture ◽

Wasserstein Distance ◽

Positive Semidefinite Matrices ◽

Convex Optimization Problem ◽

Markov Decision ◽

Policy Optimization

In this paper, we propose a general Riemannian proximal optimization algorithm with guaranteed convergence to solve Markov decision process (MDP) problems. To model policy functions in MDP, we employ Gaussian mixture model (GMM) and formulate it as a non-convex optimization problem in the Riemannian space of positive semidefinite matrices. For two given policy functions, we also provide its lower bound on policy improvement by using bounds derived from the Wasserstein distance of GMMs. Preliminary experiments show the efficacy of our proposed Riemannian proximal policy optimization algorithm.

Download Full-text

Joint Representation Learning for Multi-Modal Transportation Recommendation

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33011036 ◽

2019 ◽

Vol 33 ◽

pp. 1036-1043 ◽

Cited By ~ 12

Author(s):

Hao Liu ◽

Ting Li ◽

Renjun Hu ◽

Yanjie Fu ◽

Jingjing Gu ◽

...

Keyword(s):

Large Scale ◽

Optimization Problem ◽

Representation Learning ◽

Transport Networks ◽

Successful Development ◽

Multimodal Transportation ◽

Learning Framework ◽

Transportation Modes ◽

Transport Modes ◽

Joint Representation

Multi-modal transportation recommendation has a goal of recommending a travel plan which considers various transportation modes, such as walking, cycling, automobile, and public transit, and how to connect among these modes. The successful development of multi-modal transportation recommendation systems can help to satisfy the diversified needs of travelers and improve the efficiency of transport networks. However, existing transport recommender systems mainly focus on unimodal transport planning. To this end, in this paper, we propose a joint representation learning framework for multi-modal transportation recommendation based on a carefully-constructed multi-modal transportation graph. Specifically, we first extract a multi-modal transportation graph from large-scale map query data to describe the concurrency of users, Origin-Destination (OD) pairs, and transport modes. Then, we provide effective solutions for the optimization problem and develop an anchor embedding for transport modes to initialize the embeddings of transport modes. Moreover, we infer user relevance and OD pair relevance, and incorporate them to regularize the representation learning. Finally, we exploit the learned representations for online multimodal transportation recommendations. Indeed, our method has been deployed into one of the largest navigation Apps to serve hundreds of millions of users, and extensive experimental results with real-world map query data demonstrate the enhanced performance of the proposed method for multimodal transportation recommendations.

Download Full-text