Deep Reinforcement Learning for Tropical Air Free-cooled Data Center Control

Duc Van Le; Rongrong Wang; Yingbo Liu; Rui Tan; Yew-Wah Wong; Yonggang Wen

doi:10.1145/3439332

Deep Reinforcement Learning for Tropical Air Free-cooled Data Center Control

ACM Transactions on Sensor Networks ◽

10.1145/3439332 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-28

Author(s):

Duc Van Le ◽

Rongrong Wang ◽

Yingbo Liu ◽

Rui Tan ◽

Yew-Wah Wong ◽

...

Keyword(s):

Reinforcement Learning ◽

Air Temperature ◽

Control Policy ◽

Real Data ◽

Superior Performance ◽

Tropical Zone ◽

Performance Improvements ◽

Extensive Evaluation ◽

Markov Decision ◽

The Tropics

Air free-cooled data centers (DCs) have not existed in the tropical zone due to the unique challenges of year-round high ambient temperature and relative humidity (RH). The increasing availability of servers that can tolerate higher temperatures and RH due to the regulatory bodies’ prompts to raise DC temperature setpoints sheds light upon the feasibility of air free-cooled DCs in the tropics. However, due to the complex psychrometric dynamics, operating the air free-cooled DC in the tropics generally requires adaptive control of supply air condition to maintain the computing performance and reliability of the servers. This article studies the problem of controlling the supply air temperature and RH in a free-cooled tropical DC below certain thresholds. To achieve the goal, we formulate the control problem as Markov decision processes and apply deep reinforcement learning (DRL) to learn the control policy that minimizes the cooling energy while satisfying the requirements on the supply air temperature and RH. We also develop a constrained DRL solution for performance improvements. Extensive evaluation based on real data traces collected from an air free-cooled testbed and comparisons among the unconstrained and constrained DRL approaches as well as two other baseline approaches show the superior performance of our proposed solutions.

Download Full-text

Ordering-Based Causal Discovery with Reinforcement Learning

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/491 ◽

2021 ◽

Author(s):

Xiaoqiang Wang ◽

Yali Du ◽

Shengyu Zhu ◽

Liangjun Ke ◽

Zhitang Chen ◽

...

Keyword(s):

Reinforcement Learning ◽

Directed Graphs ◽

Real Data ◽

Causal Discovery ◽

Small Scale ◽

Search Problem ◽

Data Sets ◽

Proposed Model ◽

Markov Decision ◽

Improved Performance

It is a long-standing question to discover causal relations among a set of variables in many empirical sciences. Recently, Reinforcement Learning (RL) has achieved promising results in causal discovery from observational data. However, searching the space of directed graphs and enforcing acyclicity by implicit penalties tend to be inefficient and restrict the existing RL-based method to small scale problems. In this work, we propose a novel RL-based approach for causal discovery, by incorporating RL into the ordering-based paradigm. Specifically, we formulate the ordering search problem as a multi-step Markov decision process, implement the ordering generating process with an encoder-decoder architecture, and finally use RL to optimize the proposed model based on the reward mechanisms designed for each ordering. A generated ordering would then be processed using variable selection to obtain the final causal graph. We analyze the consistency and computational complexity of the proposed method, and empirically show that a pretrained model can be exploited to accelerate training. Experimental results on both synthetic and real data sets shows that the proposed method achieves a much improved performance over existing RL-based method.

Download Full-text

Reinforcement Learning Approaches to Optimal Market Making

Mathematics ◽

10.3390/math9212689 ◽

2021 ◽

Vol 9 (21) ◽

pp. 2689

Author(s):

Bruno Gašperov ◽

Stjepan Begušić ◽

Petra Posedel Šimović ◽

Zvonko Kostanjčar

Keyword(s):

Reinforcement Learning ◽

Control Process ◽

Analytical Models ◽

Superior Performance ◽

Inventory Level ◽

Order Book ◽

Learning Approaches ◽

Dynamic Adjustment ◽

Market Making ◽

Markov Decision

Market making is the process whereby a market participant, called a market maker, simultaneously and repeatedly posts limit orders on both sides of the limit order book of a security in order to both provide liquidity and generate profit. Optimal market making entails dynamic adjustment of bid and ask prices in response to the market maker’s current inventory level and market conditions with the goal of maximizing a risk-adjusted return measure. This problem is naturally framed as a Markov decision process, a discrete-time stochastic (inventory) control process. Reinforcement learning, a class of techniques based on learning from observations and used for solving Markov decision processes, lends itself particularly well to it. Recent years have seen a very strong uptick in the popularity of such techniques in the field, fueled in part by a series of successes of deep reinforcement learning in other domains. The primary goal of this paper is to provide a comprehensive and up-to-date overview of the current state-of-the-art applications of (deep) reinforcement learning focused on optimal market making. The analysis indicated that reinforcement learning techniques provide superior performance in terms of the risk-adjusted return over more standard market making strategies, typically derived from analytical models.

Download Full-text

Alternative Method to the Replication of Wind Effects into the Buildings Thermal Simulation

Buildings ◽

10.3390/buildings10120237 ◽

2020 ◽

Vol 10 (12) ◽

pp. 237

Author(s):

Aiman Albatayneh ◽

Dariusz Alterman ◽

Adrian Page ◽

Behdad Moghtaderi

Keyword(s):

Air Temperature ◽

High Performance ◽

Real Data ◽

Wind Effect ◽

Wind Effects ◽

Energy Assessment ◽

Wind Simulation ◽

Brick Veneer ◽

Equal Capacity ◽

Simulation Time

To design energy-efficient buildings, energy assessment programs need to be developed for determining the inside air temperature, so that thermal comfort of the occupant can be sustained. The internal temperatures could be calculated through computational fluid dynamics (CFD) analysis; however, miniscule time steps (seconds and milliseconds) are used by a long-term simulation (i.e., weeks, months) that require excessive time for computing wind effects results even for high-performance personal computers. This paper examines a new method, wherein the wind effect surrounding the buildings is integrated with the external air temperature to facilitate wind simulation in building analysis over long periods. This was done with the help of an equivalent temperature (known as Tnatural), where the convection heat loss is produced in an equal capacity by this air temperature and by the built-in wind effects. Subsequently, this new external air temperature Tnatural can be used to calculate the internal air temperature. Upon inclusion of wind effects, above 90% of the results were found to be within 0–3 °C of the perceived temperatures compared to the real data (99% for insulated cavity brick (InsCB), 91% for cavity brick (CB), 93% for insulated reverse brick veneer (InsRBV) and 94% for insulated brick veneer (InsBV) modules). However, a decline of 83–88% was observed in the results after ignoring the wind effects. Hence, the presence of wind effects holds greater importance in correct simulation of the thermal performance of the modules. Moreover, the simulation time will expectedly reduce to below 1% of the original simulation time.

Download Full-text

A New Extension of Thinning-Based Integer-Valued Autoregressive Models for Count Data

Entropy ◽

10.3390/e23010062 ◽

2020 ◽

Vol 23 (1) ◽

pp. 62

Author(s):

Zhengwei Liu ◽

Fukang Zhu

Keyword(s):

Likelihood Estimation ◽

Real Data ◽

Autoregressive Models ◽

Superior Performance ◽

Data Sets ◽

Binomial Thinning ◽

Free Case ◽

Two Parameters ◽

Conditional Maximum ◽

Thinning Operator

The thinning operators play an important role in the analysis of integer-valued autoregressive models, and the most widely used is the binomial thinning. Inspired by the theory about extended Pascal triangles, a new thinning operator named extended binomial is introduced, which is a general case of the binomial thinning. Compared to the binomial thinning operator, the extended binomial thinning operator has two parameters and is more flexible in modeling. Based on the proposed operator, a new integer-valued autoregressive model is introduced, which can accurately and flexibly capture the dispersed features of counting time series. Two-step conditional least squares (CLS) estimation is investigated for the innovation-free case and the conditional maximum likelihood estimation is also discussed. We have also obtained the asymptotic property of the two-step CLS estimator. Finally, three overdispersed or underdispersed real data sets are considered to illustrate a superior performance of the proposed model.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Optimal Policies for Quantum Markov Decision Processes

International Journal of Automation and Computing ◽

10.1007/s11633-021-1278-z ◽

2021 ◽

Author(s):

Ming-Sheng Ying ◽

Yuan Feng ◽

Sheng-Gang Ying

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Quantum Systems ◽

Sequential Decision Making ◽

Mathematical Framework ◽

Sequential Decision ◽

Learning Techniques ◽

Optimal Policies ◽

Markov Decision ◽

Programming Algorithms

AbstractMarkov decision process (MDP) offers a general framework for modelling sequential decision making where outcomes are random. In particular, it serves as a mathematical framework for reinforcement learning. This paper introduces an extension of MDP, namely quantum MDP (qMDP), that can serve as a mathematical model of decision making about quantum systems. We develop dynamic programming algorithms for policy evaluation and finding optimal policies for qMDPs in the case of finite-horizon. The results obtained in this paper provide some useful mathematical tools for reinforcement learning techniques applied to the quantum world.

Download Full-text

Optimising Performance for NB-IoT UE Devices through Data Driven Models

Journal of Sensor and Actuator Networks ◽

10.3390/jsan10010021 ◽

2021 ◽

Vol 10 (1) ◽

pp. 21

Author(s):

Omar Nassef ◽

Toktam Mahmoodi ◽

Foivos Michelinakis ◽

Kashif Mahmood ◽

Ahmed Elmokashfi

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Gradient Descent ◽

Deep Neural Network ◽

Narrow Band ◽

Learning Algorithm ◽

Base Station ◽

User Equipment ◽

Data Driven ◽

Superior Performance

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.

Download Full-text

Statistically Model Checking PCTL Specifications on Markov Decision Processes via Reinforcement Learning

2020 59th IEEE Conference on Decision and Control (CDC) ◽

10.1109/cdc42340.2020.9303982 ◽

2020 ◽

Author(s):

Yu Wang ◽

Nima Roohi ◽

Matthew West ◽

Mahesh Viswanathan ◽

Geir E. Dullerud

Keyword(s):

Reinforcement Learning ◽

Model Checking ◽

Markov Decision Processes ◽

Decision Processes ◽

Markov Decision

Download Full-text

A Multi-Step Neural Control for Motor Brain-Machine Interface by Reinforcement Learning

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.461.565 ◽

2013 ◽

Vol 461 ◽

pp. 565-569 ◽

Cited By ~ 2

Author(s):

Fang Wang ◽

Kai Xu ◽

Qiao Sheng Zhang ◽

Yi Wen Wang ◽

Xiao Xiang Zheng

Keyword(s):

Reinforcement Learning ◽

Neural Activity ◽

Neural Control ◽

Brain Machine Interface ◽

Learning Approach ◽

Complex Task ◽

Neural Data ◽

Neural Spikes ◽

Performance Improvements ◽

Machine Interface

Brain-machine interfaces (BMIs) decode cortical neural spikes of paralyzed patients to control external devices for the purpose of movement restoration. Neuroplasticity induced by conducting a relatively complex task within multistep, is helpful to performance improvements of BMI system. Reinforcement learning (RL) allows the BMI system to interact with the environment to learn the task adaptively without a teacher signal, which is more appropriate to the case for paralyzed patients. In this work, we proposed to apply Q(λ)-learning to multistep goal-directed tasks using users neural activity. Neural data were recorded from M1 of a monkey manipulating a joystick in a center-out task. Compared with a supervised learning approach, significant BMI control was achieved with correct directional decoding in 84.2% and 81% of the trials from naïve states. The results demonstrate that the BMI system was able to complete a task by interacting with the environment, indicating that RL-based methods have the potential to develop more natural BMI systems.

Download Full-text

Intelligent Ramp Control for Incident Response Using Dyna-QArchitecture

Mathematical Problems in Engineering ◽

10.1155/2015/896943 ◽

2015 ◽

Vol 2015 ◽

pp. 1-16

Author(s):

Chao Lu ◽

Yanan Zhao ◽

Jianwei Gong

Keyword(s):

Reinforcement Learning ◽

Travel Time ◽

Single Agent ◽

Superior Performance ◽

Model Free ◽

Road Users ◽

Total Travel Time ◽

The Uk ◽

Traffic Operation ◽

Ramp Control

Reinforcement learning (RL) has shown great potential for motorway ramp control, especially under the congestion caused by incidents. However, existing applications limited to single-agent tasks and based onQ-learning have inherent drawbacks for dealing with coordinated ramp control problems. For solving these problems, a Dyna-Qbased multiagent reinforcement learning (MARL) system named Dyna-MARL has been developed in this paper. Dyna-Qis an extension ofQ-learning, which combines model-free and model-based methods to obtain benefits from both sides. The performance of Dyna-MARL is tested in a simulated motorway segment in the UK with the real traffic data collected from AM peak hours. The test results compared with Isolated RL and noncontrolled situations show that Dyna-MARL can achieve a superior performance on improving the traffic operation with respect to increasing total throughput, reducing total travel time and CO2emission. Moreover, with a suitable coordination strategy, Dyna-MARL can maintain a highly equitable motorway system by balancing the travel time of road users from different on-ramps.

Download Full-text