Comparing Reinforcement Learning Methods for Real-Time Optimization of a Chemical Process

Titus Quah; Derek Machalek; Kody M. Powell

doi:10.3390/pr8111497

Comparing Reinforcement Learning Methods for Real-Time Optimization of a Chemical Process

Processes ◽

10.3390/pr8111497 ◽

2020 ◽

Vol 8 (11) ◽

pp. 1497

Author(s):

Titus Quah ◽

Derek Machalek ◽

Kody M. Powell

Keyword(s):

Reinforcement Learning ◽

Chemical Process ◽

Optimization Method ◽

Training Data ◽

Economic Optimization ◽

Artificial Neural Network Ann ◽

Policy Optimization ◽

Real Time Optimization ◽

Operational Data

One popular method for optimizing systems, referred to as ANN-PSO, uses an artificial neural network (ANN) to approximate the system and an optimization method like particle swarm optimization (PSO) to select inputs. However, with reinforcement learning developments, it is important to compare ANN-PSO to newer algorithms, like Proximal Policy Optimization (PPO). To investigate ANN-PSO’s and PPO’s performance and applicability, we compare their methodologies, apply them on steady-state economic optimization of a chemical process, and compare their results to a conventional first principles modeling with nonlinear programming (FP-NLP). Our results show that ANN-PSO and PPO achieve profits nearly as high as FP-NLP, but PPO achieves slightly higher profits compared to ANN-PSO. We also find PPO has the fastest computational times, 10 and 10,000 times faster than FP-NLP and ANN-PSO, respectively. However, PPO requires more training data than ANN-PSO to converge to an optimal policy. This case study suggests PPO has better performance as it achieves higher profits and faster online computational times. ANN-PSO shows better applicability with its capability to train on historical operational data and higher training efficiency.

Download Full-text

Multi-Context Generation in Virtual Reality Environments Using Deep Reinforcement Learning

Volume 9: 40th Computers and Information in Engineering Conference (CIE) ◽

10.1115/detc2020-22624 ◽

2020 ◽

Author(s):

James Cunningham ◽

Christian Lopez ◽

Omar Ashour ◽

Conrad S. Tucker

Keyword(s):

Virtual Reality ◽

Reinforcement Learning ◽

Virtual Environments ◽

Probability Distributions ◽

Automatic Generation ◽

Grocery Store ◽

Training Data ◽

Learning Approaches ◽

Common Concept

Abstract In this work, a Deep Reinforcement Learning (RL) approach is proposed for Procedural Content Generation (PCG) that seeks to automate the generation of multiple related virtual reality (VR) environments for enhanced personalized learning. This allows for the user to be exposed to multiple virtual scenarios that demonstrate a consistent theme, which is especially valuable in an educational context. RL approaches to PCG offer the advantage of not requiring training data, as opposed to other PCG approaches that employ supervised learning approaches. This work advances the state of the art in RL-based PCG by demonstrating the ability to generate a diversity of contexts in order to teach the same underlying concept. A case study is presented that demonstrates the feasibility of the proposed RL-based PCG method using examples of probability distributions in both manufacturing facility and grocery store virtual environments. The method demonstrated in this paper has the potential to enable the automatic generation of a variety of virtual environments that are connected by a common concept or theme.

Download Full-text

Spatial Grammar-Based Recurrent Neural Network for Design Form and Behavior Optimization

Journal of Mechanical Design ◽

10.1115/1.4044398 ◽

2019 ◽

Vol 141 (12) ◽

Cited By ~ 2

Author(s):

Gary M. Stump ◽

Simon W. Miller ◽

Michael A. Yukish ◽

Timothy W. Simpson ◽

Conrad Tucker

Keyword(s):

Reinforcement Learning ◽

Degrees Of Freedom ◽

Parametric Design ◽

Control Policy ◽

Training Data ◽

High Performing ◽

First Case ◽

Spatial Grammar ◽

And Behavior

Abstract A novel method has been developed to optimize both the form and behavior of complex systems. The method uses spatial grammars embodied in character-recurrent neural networks (char-RNNs) to define the system including actuator numbers and degrees of freedom, reinforcement learning to optimize actuator behavior, and physics-based simulation systems to determine performance and provide (re)training data for the char-RNN. Compared to parametric design optimization with fixed numbers of inputs, using grammars and char-RNNs allows for a more complex, combinatorial infinite design space. In the proposed method, the char-RNN is first trained to learn a spatial grammar that defines the assembly layout, component geometries, material properties, and arbitrary numbers and degrees of freedom of actuators. Next, generated designs are evaluated using a physics-based environment, with an inner optimization loop using reinforcement learning to determine the best control policy for the actuators. The resulting design is thus optimized for both form and behavior, generated by a char-RNN embodying a high-performing grammar. Two evaluative case studies are presented using the design of the modular sailing craft. The first case study optimizes the design without actuated surfaces, allowing the char-RNN to understand the semantics of high-performing designs. The second case study extends the first by incorporating controllable actuators requiring an inner loop behavioral optimization. The implications of the results are discussed along with the ongoing and future work.

Download Full-text

Research on Optimization of Ground-Coupled Heat Pump Systems under Specific Constraint Conditions

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.953-954.673 ◽

2014 ◽

Vol 953-954 ◽

pp. 673-679

Author(s):

Yang Yang Wang ◽

Ping Fang Hu ◽

Fei Lei ◽

Na Zhu ◽

Tian Hua Wu ◽

...

Keyword(s):

Heat Pump ◽

Optimization Problem ◽

Design Method ◽

Optimization Method ◽

Simulation Software ◽

Design Parameters ◽

Decision Variable ◽

Economic Optimization ◽

Ground Coupled Heat Pump

A design method for ground-coupled heat pump (GCHP) systems with specific constraint conditions is proposed. The total borehole number, borehole depth, borehole space and average velocity of fluid in the U-tube are considered as variables in the optimization problem. The optimization problem of four variables is transformed into that of single decision variable. A case study, which includes different schemes for designing GCHP systems of an office building and the corresponding economic analysis, is performed with the aid of simulation software. The result shows that optimal design parameters could be found in an economic optimization problem with specific constraint conditions. Additionally, design parameters may have a notable influence on the energy consumption of circulating pumps. The optimization method in this paper could be utilized by engineering designers for reference.

Download Full-text

Policy Optimization with Model-Based Explorations

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014675 ◽

2019 ◽

Vol 33 ◽

pp. 4675-4682 ◽

Cited By ~ 2

Author(s):

Feiyang Pan ◽

Qingpeng Cai ◽

An-Xiang Zeng ◽

Chun-Xiang Pan ◽

Qing Da ◽

...

Keyword(s):

Reinforcement Learning ◽

Optimization Method ◽

Monte Carlo Sampling ◽

New Technique ◽

Learning Methods ◽

Model Based ◽

Model Free ◽

Hand Model ◽

Target Values ◽

Policy Optimization

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games. However, these methods suffer from high variances and high sample complexity. On the other hand, model-based reinforcement learning methods that learn the transition dynamics are more sample efficient, but they often suffer from the bias of the transition estimation. How to make use of both model-based and model-free learning is a central problem in reinforcement learning.In this paper, we present a new technique to address the tradeoff between exploration and exploitation, which regards the difference between model-free and model-based estimations as a measure of exploration value. We apply this new technique to the PPO algorithm and arrive at a new policy optimization method, named Policy Optimization with Modelbased Explorations (POME). POME uses two components to predict the actions’ target values: a model-free one estimated by Monte-Carlo sampling and a model-based one which learns a transition model and predicts the value of the next state. POME adds the error of these two target estimations as the additional exploration value for each state-action pair, i.e, encourages the algorithm to explore the states with larger target errors which are hard to estimate. We compare POME with PPO on Atari 2600 games, and it shows that POME outperforms PPO on 33 games out of 49 games.

Download Full-text

Economic Optimization in the Non-Steady-State Periodic Orbit under Zone Model Predictive Control for the Chemical Process: A Case Study of a Heavy-Oil Fractionator

Industrial & Engineering Chemistry Research ◽

10.1021/acs.iecr.1c01168 ◽

2021 ◽

Author(s):

Xin Wan ◽

Xiong-Lin Luo

Keyword(s):

Steady State ◽

Periodic Orbit ◽

Model Predictive Control ◽

Predictive Control ◽

Heavy Oil ◽

Chemical Process ◽

Zone Model ◽

Economic Optimization

Download Full-text

Using Data Augmentation Based Reinforcement Learning for Daily Stock Trading

Electronics ◽

10.3390/electronics9091384 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1384

Author(s):

Yuyu Yuan ◽

Wen Wen ◽

Jincui Yang

Keyword(s):

Reinforcement Learning ◽

Data Augmentation ◽

Training Data ◽

Stock Trading ◽

Data Set ◽

Stable Algorithm ◽

Q Learning ◽

Using Data ◽

Policy Optimization ◽

Sharp Ratio

In algorithmic trading, adequate training data set is key to making profits. However, stock trading data in units of a day can not meet the great demand for reinforcement learning. To address this problem, we proposed a framework named data augmentation based reinforcement learning (DARL) which uses minute-candle data (open, high, low, close) to train the agent. The agent is then used to guide daily stock trading. In this way, we can increase the instances of data available for training in hundreds of folds, which can substantially improve the reinforcement learning effect. But not all stocks are suitable for this kind of trading. Therefore, we propose an access mechanism based on skewness and kurtosis to select stocks that can be traded properly using this algorithm. In our experiment, we find proximal policy optimization (PPO) is the most stable algorithm to achieve high risk-adjusted returns. Deep Q-learning (DQN) and soft actor critic (SAC) can beat the market in Sharp Ratio.

Download Full-text

An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning

IEEE Transactions on Neural Networks and Learning Systems ◽

10.1109/tnnls.2020.3044196 ◽

2021 ◽

pp. 1-13

Author(s):

Wenjia Meng ◽

Qian Zheng ◽

Yue Shi ◽

Gang Pan

Keyword(s):

Reinforcement Learning ◽

Trust Region ◽

Optimization Method ◽

Policy Optimization

Download Full-text

Deep Model-Based Reinforcement Learning via Estimated Uncertainty and Conservative Policy Optimization

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6177 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6941-6948

Author(s):

Qi Zhou ◽

HouQiang Li ◽

Jie Wang

Keyword(s):

Reinforcement Learning ◽

Performance Improvement ◽

Optimization Method ◽

Asymptotic Performance ◽

Model Based ◽

Model Free ◽

Deep Model ◽

Conservative Policy ◽

Policy Optimization ◽

Novel Model

Model-based reinforcement learning algorithms tend to achieve higher sample efficiency than model-free methods. However, due to the inevitable errors of learned models, model-based methods struggle to achieve the same asymptotic performance as model-free methods. In this paper, We propose a Policy Optimization method with Model-Based Uncertainty (POMBU)—a novel model-based approach—that can effectively improve the asymptotic performance using the uncertainty in Q-values. We derive an upper bound of the uncertainty, based on which we can approximate the uncertainty accurately and efficiently for model-based methods. We further propose an uncertainty-aware policy optimization algorithm that optimizes the policy conservatively to encourage performance improvement with high probability. This can significantly alleviate the overfitting of policy to inaccurate models. Experiments show POMBU can outperform existing state-of-the-art policy optimization algorithms in terms of sample efficiency and asymptotic performance. Moreover, the experiments demonstrate the excellent robustness of POMBU compared to previous model-based approaches.

Download Full-text

Economic optimization for the rehabilitation of co-located mixed assets

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2016-0509 ◽

2017 ◽

Vol 44 (10) ◽

pp. 820-828 ◽

Cited By ~ 1

Author(s):

Dina A. Saad ◽

Tarek Hegazy

Keyword(s):

Implementation Strategies ◽

Optimization Method ◽

Economic Optimization ◽

Benefit Cost Analysis ◽

Fund Allocation ◽

Economic Justification ◽

Infrastructure Rehabilitation ◽

The Right ◽

Benefit Cost

Managing the rehabilitation of co-located infrastructure assets (pavements, pipelines, culverts, etc.) has become a major challenge for municipalities due to the varying rehabilitation requirements of these assets and the need for better coordination of rehabilitation works. Yet, most of the existing fund-allocation methods are not structured to address co-located infrastructure rehabilitation work in a systematic manner. This paper, therefore, extends the enhanced benefit-cost analysis (EBCA) optimization method that was developed earlier for a single asset type, to the case of co-located assets. The extended EBCA approach arrives at near-optimum funding decisions by achieving an equilibrium state at which fair and equitable allocations are made among all asset categories. Using a real case study consisting of bridges and culverts co-located in the right of way of a pavement network along with two different implementation strategies, EBCA proved to be able to arrive at near-optimum fund-allocations supported with a credible economic justification.

Download Full-text

Deep Reinforcement Learning for Multiparameter Optimization in de novo Drug Design

10.26434/chemrxiv.7990910.v2 ◽

2019 ◽

Author(s):

Niclas Ståhl ◽

Göran Falkman ◽

Alexander Karlsson ◽

Gunnar Mathiason ◽

Jonas Boström

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

De Novo ◽

De Novo Drug Design ◽

Generative Process ◽

New Methods ◽

Multiparameter Optimization ◽

Long Short Term Memory ◽

New Compounds

<p>In medicinal chemistry programs it is key to design and make compounds that are efficacious and safe. This is a long, complex and difficult multi-parameter optimization process, often including several properties with orthogonal trends. New methods for the automated design of compounds against profiles of multiple properties are thus of great value. Here we present a fragment-based reinforcement learning approach based on an actor-critic model, for the generation of novel molecules with optimal properties. The actor and the critic are both modelled with bidirectional long short-term memory (LSTM) networks. The AI method learns how to generate new compounds with desired properties by starting from an initial set of lead molecules and then improve these by replacing some of their fragments. A balanced binary tree based on the similarity of fragments is used in the generative process to bias the output towards structurally similar molecules. The method is demonstrated by a case study showing that 93% of the generated molecules are chemically valid, and a third satisfy the targeted objectives, while there were none in the initial set.</p>

Download Full-text