Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning

Zhandos Kegenbekov; Ilya Jackson

doi:10.3390/a14080240

Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning

Algorithms ◽

10.3390/a14080240 ◽

2021 ◽

Vol 14 (8) ◽

pp. 240

Author(s):

Zhandos Kegenbekov ◽

Ilya Jackson

Keyword(s):

Supply Chain ◽

Adaptive Control ◽

Reinforcement Learning ◽

Supply Chains ◽

Optimization Algorithm ◽

Base Stock ◽

Base Stock Policy ◽

Learning Agent ◽

Policy Optimization ◽

Stock Policy

Adaptive and highly synchronized supply chains can avoid a cascading rise-and-fall inventory dynamic and mitigate ripple effects caused by operational failures. This paper aims to demonstrate how a deep reinforcement learning agent based on the proximal policy optimization algorithm can synchronize inbound and outbound flows and support business continuity operating in the stochastic and nonstationary environment if end-to-end visibility is provided. The deep reinforcement learning agent is built upon the Proximal Policy Optimization algorithm, which does not require hardcoded action space and exhaustive hyperparameter tuning. These features, complimented with a straightforward supply chain environment, give rise to a general and task unspecific approach to adaptive control in multi-echelon supply chains. The proposed approach is compared with the base-stock policy, a well-known method in classic operations research and inventory control theory. The base-stock policy is prevalent in continuous-review inventory systems. The paper concludes with the statement that the proposed solution can perform adaptive control in complex supply chains. The paper also postulates fully fledged supply chain digital twins as a necessary infrastructural condition for scalable real-world applications.

Download Full-text

A Deep Q-Network for the Beer Game: Deep Reinforcement Learning for Inventory Optimization

Manufacturing & Service Operations Management ◽

10.1287/msom.2020.0939 ◽

2021 ◽

Author(s):

Afshin Oroojlooyjadid ◽

MohammadReza Nazari ◽

Lawrence V. Snyder ◽

Martin Takáč

Keyword(s):

Supply Chain ◽

Reinforcement Learning ◽

Transfer Learning ◽

Problem Definition ◽

Base Stock ◽

Inventory Optimization ◽

Base Stock Policy ◽

Order Quantities ◽

Stock Policy ◽

Beer Game

Problem definition: The beer game is widely used in supply chain management classes to demonstrate the bullwhip effect and the importance of supply chain coordination. The game is a decentralized, multiagent, cooperative problem that can be modeled as a serial supply chain network in which agents choose order quantities while cooperatively attempting to minimize the network’s total cost, although each agent only observes local information. Academic/practical relevance: Under some conditions, a base-stock replenishment policy is optimal. However, in a decentralized supply chain in which some agents act irrationally, there is no known optimal policy for an agent wishing to act optimally. Methodology: We propose a deep reinforcement learning (RL) algorithm to play the beer game. Our algorithm makes no assumptions about costs or other settings. As with any deep RL algorithm, training is computationally intensive, but once trained, the algorithm executes in real time. We propose a transfer-learning approach so that training performed for one agent can be adapted quickly for other agents and settings. Results: When playing with teammates who follow a base-stock policy, our algorithm obtains near-optimal order quantities. More important, it performs significantly better than a base-stock policy when other agents use a more realistic model of human ordering behavior. We observe similar results using a real-world data set. Sensitivity analysis shows that a trained model is robust to changes in the cost coefficients. Finally, applying transfer learning reduces the training time by one order of magnitude. Managerial implications: This paper shows how artificial intelligence can be applied to inventory optimization. Our approach can be extended to other supply chain optimization problems, especially those in which supply chain partners act in irrational or unpredictable ways. Our RL agent has been integrated into a new online beer game, which has been played more than 17,000 times by more than 4,000 people.

Download Full-text

When Variability Trumps Volatility: Optimal Control and Value of Reverse Logistics in Supply Chains with Multiple Flows of Product

Manufacturing & Service Operations Management ◽

10.1287/msom.2020.0874 ◽

2020 ◽

Author(s):

Alexandar Angelus ◽

Özalp Özer

Keyword(s):

Supply Chain ◽

Supply Chains ◽

Reverse Logistics ◽

Demand Uncertainty ◽

Problem Definition ◽

Feasible Region ◽

Base Stock Policy ◽

Multiple Flows ◽

Stock Policy ◽

Heuristic Policy

Problem definition: We study how to optimally control a multistage supply chain in which each location can initiate multiple flows of product, including the reverse flow of orders. We also quantify the resulting value generated by reverse logistics and identify the drivers of that value. Academic/practical relevance: Reverse logistics has been gaining recognition in practice and theory for helping companies better match supply with demand, and thus reduce costs in their supply chains. Nevertheless, there remains a lack of clarity in practice and the research literature regarding precisely what in reverse logistics is so important, exactly how reverse logistics creates value, and what the drivers of that value are. Methodology: We first formulate a multistage inventory model to jointly optimize ordering decisions pertaining to regular, reverse, and expedited flows of product in a logistics supply chain, where the physical transformation of the product is completed at the most upstream location. With multiple product flows, the feasible region for the problem acquires multidimensional boundaries that lead to the curse of dimensionality. Next, we extend our analysis to product-transforming supply chains, in which product transformation is allowed to occur at each location. In such a system, it becomes necessary to keep track of both the location and stage of completion of each unit of inventory; thus, the number of state and decision variables increases with the square of the number of locations. Results: To solve the reverse logistics problem in logistics supply chains, we develop a different solution method that allows us to reduce the dimensionality of the feasible region and identify the structure of the optimal policy. We refer to this policy as a nested echelon base stock policy, as decisions for different product flows are sequentially nested within each other. We show that this policy renders the model analytically and numerically tractable. Our results provide actionable policies for firms to jointly manage the three different product flows in their supply chains and allow us to arrive at insights regarding the main drivers of the value of reverse logistics. One of our key findings is that, when it comes to the value generated by reverse logistics, demand variability (i.e., demand uncertainty across periods) matters more than demand volatility (i.e., demand uncertainty within each period). To analyze product-transforming supply chains, we first identify a policy that provides a lower bound on the total cost. Then, we establish a special decomposition of the objective cost function that allows us to propose a novel heuristic policy. We find that the performance gap of our heuristic policy relative to the lower-bounding policy averages less than 5% across a range of parameters and supply chain lengths. Managerial implications: Researchers can build on our methodology to study more complex reverse logistics settings, as well as tackle other inventory problems with multidimensional boundaries of the feasible region. Our insights can help companies involved in reverse logistics to better manage their orders for products, and better understand the value created by this capability and when (not) to invest in reverse logistics.

Download Full-text

A production base-stock policy for recycling supply chain management in the presence of uncertainty

Computers & Industrial Engineering ◽

10.1016/j.cie.2014.08.002 ◽

2014 ◽

Vol 76 ◽

pp. 193-203 ◽

Cited By ~ 3

Author(s):

Ching-Chin Chern ◽

Pei-Yu Chen ◽

Kwei-Long Huang

Keyword(s):

Supply Chain ◽

Supply Chain Management ◽

Base Stock ◽

Chain Management ◽

Base Stock Policy ◽

Stock Policy

Download Full-text

How a Base Stock Policy Using "Stale" Forecasts Provides Supply Chain Benefits

Manufacturing & Service Operations Management ◽

10.1287/msom.1030.0034 ◽

2004 ◽

Vol 6 (2) ◽

pp. 149-162 ◽

Cited By ~ 25

Author(s):

Julia Miyaoka ◽

Warren Hausman

Keyword(s):

Supply Chain ◽

Base Stock ◽

Base Stock Policy ◽

Stock Policy

Download Full-text

Incorporating Order Crossover Information into Service-Oriented Base Stock Policy Decisions

SSRN Electronic Journal ◽

10.2139/ssrn.3006781 ◽

2017 ◽

Cited By ~ 1

Author(s):

Dean C. Chatfield ◽

Alan Pritchard

Keyword(s):

Base Stock ◽

Policy Decisions ◽

Base Stock Policy ◽

Service Oriented ◽

Stock Policy

Download Full-text

Computing a Stationary Base-Stock Policy for a Finite Horizon Stochastic Inventory Problem with Non-linear Shortage Costs

Stochastic Analysis and Applications ◽

10.1081/sap-120030447 ◽

2004 ◽

Vol 22 (3) ◽

pp. 589-625 ◽

Cited By ~ 2

Author(s):

S. Çetinkaya ◽

M. Parlar

Keyword(s):

Finite Horizon ◽

Base Stock ◽

Inventory Problem ◽

Stochastic Inventory ◽

Base Stock Policy ◽

Non Linear ◽

Stochastic Inventory Problem ◽

Stock Policy ◽

Shortage Costs

Download Full-text

Optimality of Base Stock Policy under Unknown Demand Distributions: New Results and a New Method

SSRN Electronic Journal ◽

10.2139/ssrn.3858415 ◽

2021 ◽

Author(s):

Alain Bensoussan ◽

Suresh Sethi ◽

Abdoulaye Thiam ◽

Janos Turi

Keyword(s):

Base Stock ◽

New Method ◽

Base Stock Policy ◽

Stock Policy

Download Full-text

Mass Customization Production Planning System by Advance Demand Information Based on Unfulfilled-order-rate: II-Performance Analysis of Unfulfilled-order-rates and Relationship to Base-stock Policy with Multi-period

Transactions of the Institute of Systems Control and Information Engineers ◽

10.5687/iscie.24.43 ◽

2011 ◽

Vol 24 (3) ◽

pp. 43-53

Author(s):

Nobuyuki Ueno ◽

Kiyotaka Kadomoto ◽

Koji Okuhara

Keyword(s):

Performance Analysis ◽

Production Planning ◽

Mass Customization ◽

Base Stock ◽

Planning System ◽

Advance Demand Information ◽

Order Rate ◽

Base Stock Policy ◽

Demand Information ◽

Stock Policy

Download Full-text

Coordination of Multiechelon Supply Chains Using the Guaranteed Service Framework

Manufacturing & Service Operations Management ◽

10.1287/msom.2021.1043 ◽

2021 ◽

Author(s):

Tor Schoenmeyr ◽

Stephen C. Graves

Keyword(s):

Supply Chain ◽

Wholesale Price ◽

Problem Definition ◽

Quantity Discounts ◽

Minimum Order ◽

Base Stock Policy ◽

Decentralized Decision Making ◽

Guaranteed Service ◽

Stock Policy ◽

Different Parts

Problem definition: We use the guaranteed service (GS) framework to investigate how to coordinate a multiechelon supply chain when two self-interested parties control different parts of the supply chain. For purposes of supply chain planning, we assume that each stage in a supply chain operates with a local base-stock policy and can provide guaranteed service to its customers, as long as the customer demand falls within certain bounds. Academic/practical relevance: The GS framework for supply chain inventory optimization has been deployed successfully in multiple industrial contexts with centralized control. In this paper, we show how to apply this framework to achieve coordination in a decentralized setting in which two parties control different parts of the supply chain. Methodology: The primary methodology is the analysis of a multiechelon supply chain under the assumptions of the GS model. Results: We find that the GS framework is naturally well suited for this decentralized decision making, and we propose a specific contract structure that facilitates such relationships. This contract is incentive compatible and has several other desirable properties. Under assumptions of complete and incomplete information, a reasonable negotiation process should lead the parties to contract terms that coordinate the supply chain. The contract is simpler than contracts proposed for coordination in the stochastic service (SS) framework. We also highlight the role of markup on the holding costs and some of the difficulties that this might cause in coordinating a decentralized supply chain. Managerial implications: The value from the paper is to show that a simple contract coordinates the chain when both parties plan with a GS model and framework; hence, we provide more evidence for the utility of this model. Furthermore, the simple coordinating contract matches reasonably well with practice; we observe that the most common contract terms include a per-unit wholesale price (possibly with a minimum order quantity and/or quantity discounts), along with a service time from order placement until delivery or until ready to ship. We also observe that firms need to pay a higher price if they want better service. What may differ from practice is the contract provision of a demand bound; our contract specifies that the supplier will provide GS as long as the buyer’s order are within the agreed on demand bound. This provision is essential so that each party can apply the GS framework for planning their supply chain. Of course, contracts have many other provisions for handling exceptions. Nevertheless, our research provides some validation for the GS model and the contracting practices we observe in practice.

Download Full-text

Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance

Sensors ◽

10.3390/s20164546 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4546

Author(s):

Weiwei Zhao ◽

Hairong Chu ◽

Xikui Miao ◽

Lihong Guo ◽

Honghai Shen ◽

...

Keyword(s):

Reinforcement Learning ◽

Attitude Control ◽

Cooperative Control ◽

Learning Algorithm ◽

State Equations ◽

Learning Agent ◽

Environmental Adaptability ◽

Decentralized Execution ◽

Policy Optimization ◽

Multi Uav

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.

Download Full-text