Joint Strategy of Dynamic Ordering and Pricing for Competing Perishables with Q-Learning Algorithm

Wireless Communications and Mobile Computing ◽

10.1155/2021/6643195 ◽

2021 ◽

Vol 2021 ◽

pp. 1-19

Author(s):

Jiangbo Zheng ◽

Yanhong Gan ◽

Ying Liang ◽

Qingqing Jiang ◽

Jiatai Chang

Keyword(s):

Decay Rate ◽

Decision Model ◽

Learning Algorithm ◽

New Products ◽

Optimal Number ◽

Q Learning ◽

Variable Ordering ◽

Joint Strategy ◽

Potential Demand ◽

Joint Pricing

We use Machine Learning (ML) to study firms’ joint pricing and ordering decisions for perishables in a dynamic loop. The research assumption is as follows: at the beginning of each period, the retailer prices both the new and old products and determines how many new products to order, while at the end of each period, the retailer decides how much remaining inventory should be carried over to the next period. The objective is to determine a joint pricing, ordering, and disposal strategy to maximize the total expected discounted profit. We establish a decision model based on Markov processes and use the Q-learning algorithm to obtain a near-optimal policy. From numerical analysis, we find that (i) the optimal number of old products carried over to the next period depends on the upper quantitative bound for old inventory; (ii) the optimal prices for new products are positively related to potential demand but negatively related to the decay rate, while the optimal prices for old products have a positive relationship with both; and (iii) ordering decisions are unrelated to the quantity of old products. When the decay rate is low or the variable ordering cost is high, the optimal orders exhibit a trapezoidal decline as the quantity of new products increases.

Download Full-text

Prioritized epoch-incremental Q-learning algorithm

Theoretical and Applied Informatics ◽

10.2478/v10179-012-0008-1 ◽

2012 ◽

Vol 24 (2) ◽

Cited By ~ 1

Author(s):

Roman Zajdel

Keyword(s):

Learning Algorithm ◽

Q Learning

Download Full-text

Application of improved Q learning algorithm to job shop problem

Journal of Computer Applications ◽

10.3724/sp.j.1087.2008.03268 ◽

2009 ◽

Vol 28 (12) ◽

pp. 3268-3270

Author(s):

Chao WANG ◽

Jing GUO ◽

Zhen-qiang BAO

Keyword(s):

Job Shop ◽

Learning Algorithm ◽

Q Learning

Download Full-text

A Novel Q-Learning Algorithm Based on the Stochastic Environment Path Planning Problem

2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom) ◽

10.1109/trustcom50675.2020.00270 ◽

2020 ◽

Author(s):

Jian Li ◽

Fei Rong ◽

Yu Tang

Keyword(s):

Path Planning ◽

Learning Algorithm ◽

Planning Problem ◽

Stochastic Environment ◽

Q Learning ◽

Path Planning Problem

Download Full-text

Charging Guiding Strategy for PET Based on Q Learning Algorithm (iSPEC 2020)

2020 IEEE Sustainable Power and Energy Conference (iSPEC) ◽

10.1109/ispec50848.2020.9351291 ◽

2020 ◽

Author(s):

Yang You ◽

Zhaoxia Jing ◽

Yichuan Huang

Keyword(s):

Learning Algorithm ◽

Q Learning

Download Full-text

Q-Learning Algorithm Based Topology Control of Power Line Communication Networks

2020 IEEE 11th International Conference on Software Engineering and Service Science (ICSESS) ◽

10.1109/icsess49938.2020.9237707 ◽

2020 ◽

Author(s):

Wenbin Chen ◽

Libin Zheng

Keyword(s):

Communication Networks ◽

Topology Control ◽

Learning Algorithm ◽

Power Line ◽

Power Line Communication ◽

Q Learning

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

Research on Optimal Strategy of Peak-shaving of Photovoltaic Grid-connected System Based on Simulated Annealing-Q Learning Algorithm

Journal of Physics Conference Series ◽

10.1088/1742-6596/1871/1/012112 ◽

2021 ◽

Vol 1871 (1) ◽

pp. 012112

Author(s):

Xu Jun ◽

Chen Jinhui ◽

Zhang Zhe

Keyword(s):

Simulated Annealing ◽

Optimal Strategy ◽

Learning Algorithm ◽

Peak Shaving ◽

Q Learning

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text