constrained markov decision process
Recently Published Documents


TOTAL DOCUMENTS

14
(FIVE YEARS 4)

H-INDEX

5
(FIVE YEARS 0)

Entropy ◽  
2021 ◽  
Vol 23 (8) ◽  
pp. 1084
Author(s):  
Lehan Wang ◽  
Jingzhou Sun ◽  
Yuxuan Sun ◽  
Sheng Zhou ◽  
Zhisheng Niu

Timely status updates are critical in remote control systems such as autonomous driving and the industrial Internet of Things, where timeliness requirements are usually context dependent. Accordingly, the Urgency of Information (UoI) has been proposed beyond the well-known Age of Information (AoI) by further including context-aware weights which indicate whether the monitored process is in an emergency. However, the optimal updating and scheduling strategies in terms of UoI remain open. In this paper, we propose a UoI-optimal updating policy for timely status information with resource constraint. We first formulate the problem in a constrained Markov decision process and prove that the UoI-optimal policy has a threshold structure. When the context-aware weights are known, we propose a numerical method based on linear programming. When the weights are unknown, we further design a reinforcement learning (RL)-based scheduling policy. The simulation reveals that the threshold of the UoI-optimal policy increases as the resource constraint tightens. In addition, the UoI-optimal policy outperforms the AoI-optimal policy in terms of average squared estimation error, and the proposed RL-based updating policy achieves a near-optimal performance without the advanced knowledge of the system model.


Author(s):  
Yongshuai Liu ◽  
Avishai Halev ◽  
Xin Liu

Reinforcement Learning (RL) algorithms have had tremendous success in simulated domains. These algorithms, however, often cannot be directly applied to physical systems, especially in cases where there are constraints to satisfy (e.g. to ensure safety or limit resource consumption). In standard RL, the agent is incentivized to explore any policy with the sole goal of maximizing reward; in the real world, however, ensuring satisfaction of certain constraints in the process is also necessary and essential. In this article, we overview existing approaches addressing constraints in model-free reinforcement learning. We model the problem of learning with constraints as a Constrained Markov Decision Process and consider two main types of constraints: cumulative and instantaneous. We summarize existing approaches and discuss their pros and cons. To evaluate policy performance under constraints, we introduce a set of standard benchmarks and metrics. We also summarize limitations of current methods and present open questions for future research.


2021 ◽  
Vol 9 ◽  
pp. 1213-1232
Author(s):  
Hou Pong Chan ◽  
Lu Wang ◽  
Irwin King

Abstract We study controllable text summarization, which allows users to gain control on a particular attribute (e.g., length limit) of the generated summaries. In this work, we propose a novel training framework based on Constrained Markov Decision Process (CMDP), which conveniently includes a reward function along with a set of constraints, to facilitate better summarization control. The reward function encourages the generation to resemble the human-written reference, while the constraints are used to explicitly prevent the generated summaries from violating user-imposed requirements. Our framework can be applied to control important attributes of summarization, including length, covered entities, and abstractiveness, as we devise specific constraints for each of these aspects. Extensive experiments on popular benchmarks show that our CMDP framework helps generate informative summaries while complying with a given attribute’s requirement.1


Author(s):  
Manxing Du ◽  
Redouane Sassioui ◽  
Georgios Varisteas ◽  
Radu State ◽  
Mats Brorsson ◽  
...  

2015 ◽  
Vol 2015 ◽  
pp. 1-10 ◽  
Author(s):  
Yingqi Yin ◽  
Fengye Hu ◽  
Ling Cen ◽  
Yu Du ◽  
Lu Wang

As an important part of the Internet of Things (IOT) and the special case of device-to-device (D2D) communication, wireless body area network (WBAN) gradually becomes the focus of attention. Since WBAN is a body-centered network, the energy of sensor nodes is strictly restrained since they are supplied by battery with limited power. In each data collection, only one sensor node is scheduled to transmit its measurements directly to the access point (AP) through the fading channel. We formulate the problem of dynamically choosing which sensor should communicate with the AP to maximize network lifetime under the constraint of fairness as a constrained markov decision process (CMDP). The optimal lifetime and optimal policy are obtained by Bellman equation in dynamic programming. The proposed algorithm defines the limiting performance in WBAN lifetime under different degrees of fairness constraints. Due to the defect of large implementation overhead in acquiring global channel state information (CSI), we put forward a distributed scheduling algorithm that adopts local CSI, which saves the network overhead and simplifies the algorithm. It was demonstrated via simulation that this scheduling algorithm can allocate time slot reasonably under different channel conditions to balance the performances of network lifetime and fairness.


Sign in / Sign up

Export Citation Format

Share Document