scholarly journals Adaptive Supply Chain: Demand–Supply Synchronization Using Deep Reinforcement Learning

Algorithms ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 240
Author(s):  
Zhandos Kegenbekov ◽  
Ilya Jackson

Adaptive and highly synchronized supply chains can avoid a cascading rise-and-fall inventory dynamic and mitigate ripple effects caused by operational failures. This paper aims to demonstrate how a deep reinforcement learning agent based on the proximal policy optimization algorithm can synchronize inbound and outbound flows and support business continuity operating in the stochastic and nonstationary environment if end-to-end visibility is provided. The deep reinforcement learning agent is built upon the Proximal Policy Optimization algorithm, which does not require hardcoded action space and exhaustive hyperparameter tuning. These features, complimented with a straightforward supply chain environment, give rise to a general and task unspecific approach to adaptive control in multi-echelon supply chains. The proposed approach is compared with the base-stock policy, a well-known method in classic operations research and inventory control theory. The base-stock policy is prevalent in continuous-review inventory systems. The paper concludes with the statement that the proposed solution can perform adaptive control in complex supply chains. The paper also postulates fully fledged supply chain digital twins as a necessary infrastructural condition for scalable real-world applications.

Author(s):  
Afshin Oroojlooyjadid ◽  
MohammadReza Nazari ◽  
Lawrence V. Snyder ◽  
Martin Takáč

Problem definition: The beer game is widely used in supply chain management classes to demonstrate the bullwhip effect and the importance of supply chain coordination. The game is a decentralized, multiagent, cooperative problem that can be modeled as a serial supply chain network in which agents choose order quantities while cooperatively attempting to minimize the network’s total cost, although each agent only observes local information. Academic/practical relevance: Under some conditions, a base-stock replenishment policy is optimal. However, in a decentralized supply chain in which some agents act irrationally, there is no known optimal policy for an agent wishing to act optimally. Methodology: We propose a deep reinforcement learning (RL) algorithm to play the beer game. Our algorithm makes no assumptions about costs or other settings. As with any deep RL algorithm, training is computationally intensive, but once trained, the algorithm executes in real time. We propose a transfer-learning approach so that training performed for one agent can be adapted quickly for other agents and settings. Results: When playing with teammates who follow a base-stock policy, our algorithm obtains near-optimal order quantities. More important, it performs significantly better than a base-stock policy when other agents use a more realistic model of human ordering behavior. We observe similar results using a real-world data set. Sensitivity analysis shows that a trained model is robust to changes in the cost coefficients. Finally, applying transfer learning reduces the training time by one order of magnitude. Managerial implications: This paper shows how artificial intelligence can be applied to inventory optimization. Our approach can be extended to other supply chain optimization problems, especially those in which supply chain partners act in irrational or unpredictable ways. Our RL agent has been integrated into a new online beer game, which has been played more than 17,000 times by more than 4,000 people.


Author(s):  
Alexandar Angelus ◽  
Özalp Özer

Problem definition: We study how to optimally control a multistage supply chain in which each location can initiate multiple flows of product, including the reverse flow of orders. We also quantify the resulting value generated by reverse logistics and identify the drivers of that value. Academic/practical relevance: Reverse logistics has been gaining recognition in practice and theory for helping companies better match supply with demand, and thus reduce costs in their supply chains. Nevertheless, there remains a lack of clarity in practice and the research literature regarding precisely what in reverse logistics is so important, exactly how reverse logistics creates value, and what the drivers of that value are. Methodology: We first formulate a multistage inventory model to jointly optimize ordering decisions pertaining to regular, reverse, and expedited flows of product in a logistics supply chain, where the physical transformation of the product is completed at the most upstream location. With multiple product flows, the feasible region for the problem acquires multidimensional boundaries that lead to the curse of dimensionality. Next, we extend our analysis to product-transforming supply chains, in which product transformation is allowed to occur at each location. In such a system, it becomes necessary to keep track of both the location and stage of completion of each unit of inventory; thus, the number of state and decision variables increases with the square of the number of locations. Results: To solve the reverse logistics problem in logistics supply chains, we develop a different solution method that allows us to reduce the dimensionality of the feasible region and identify the structure of the optimal policy. We refer to this policy as a nested echelon base stock policy, as decisions for different product flows are sequentially nested within each other. We show that this policy renders the model analytically and numerically tractable. Our results provide actionable policies for firms to jointly manage the three different product flows in their supply chains and allow us to arrive at insights regarding the main drivers of the value of reverse logistics. One of our key findings is that, when it comes to the value generated by reverse logistics, demand variability (i.e., demand uncertainty across periods) matters more than demand volatility (i.e., demand uncertainty within each period). To analyze product-transforming supply chains, we first identify a policy that provides a lower bound on the total cost. Then, we establish a special decomposition of the objective cost function that allows us to propose a novel heuristic policy. We find that the performance gap of our heuristic policy relative to the lower-bounding policy averages less than 5% across a range of parameters and supply chain lengths. Managerial implications: Researchers can build on our methodology to study more complex reverse logistics settings, as well as tackle other inventory problems with multidimensional boundaries of the feasible region. Our insights can help companies involved in reverse logistics to better manage their orders for products, and better understand the value created by this capability and when (not) to invest in reverse logistics.


2021 ◽  
Author(s):  
Alain Bensoussan ◽  
Suresh Sethi ◽  
Abdoulaye Thiam ◽  
Janos Turi

Author(s):  
Tor Schoenmeyr ◽  
Stephen C. Graves

Problem definition: We use the guaranteed service (GS) framework to investigate how to coordinate a multiechelon supply chain when two self-interested parties control different parts of the supply chain. For purposes of supply chain planning, we assume that each stage in a supply chain operates with a local base-stock policy and can provide guaranteed service to its customers, as long as the customer demand falls within certain bounds. Academic/practical relevance: The GS framework for supply chain inventory optimization has been deployed successfully in multiple industrial contexts with centralized control. In this paper, we show how to apply this framework to achieve coordination in a decentralized setting in which two parties control different parts of the supply chain. Methodology: The primary methodology is the analysis of a multiechelon supply chain under the assumptions of the GS model. Results: We find that the GS framework is naturally well suited for this decentralized decision making, and we propose a specific contract structure that facilitates such relationships. This contract is incentive compatible and has several other desirable properties. Under assumptions of complete and incomplete information, a reasonable negotiation process should lead the parties to contract terms that coordinate the supply chain. The contract is simpler than contracts proposed for coordination in the stochastic service (SS) framework. We also highlight the role of markup on the holding costs and some of the difficulties that this might cause in coordinating a decentralized supply chain. Managerial implications: The value from the paper is to show that a simple contract coordinates the chain when both parties plan with a GS model and framework; hence, we provide more evidence for the utility of this model. Furthermore, the simple coordinating contract matches reasonably well with practice; we observe that the most common contract terms include a per-unit wholesale price (possibly with a minimum order quantity and/or quantity discounts), along with a service time from order placement until delivery or until ready to ship. We also observe that firms need to pay a higher price if they want better service. What may differ from practice is the contract provision of a demand bound; our contract specifies that the supplier will provide GS as long as the buyer’s order are within the agreed on demand bound. This provision is essential so that each party can apply the GS framework for planning their supply chain. Of course, contracts have many other provisions for handling exceptions. Nevertheless, our research provides some validation for the GS model and the contracting practices we observe in practice.


Sensors ◽  
2020 ◽  
Vol 20 (16) ◽  
pp. 4546
Author(s):  
Weiwei Zhao ◽  
Hairong Chu ◽  
Xikui Miao ◽  
Lihong Guo ◽  
Honghai Shen ◽  
...  

Multiple unmanned aerial vehicle (UAV) collaboration has great potential. To increase the intelligence and environmental adaptability of multi-UAV control, we study the application of deep reinforcement learning algorithms in the field of multi-UAV cooperative control. Aiming at the problem of a non-stationary environment caused by the change of learning agent strategy in reinforcement learning in a multi-agent environment, the paper presents an improved multiagent reinforcement learning algorithm—the multiagent joint proximal policy optimization (MAJPPO) algorithm with the centralized learning and decentralized execution. This algorithm uses the moving window averaging method to make each agent obtain a centralized state value function, so that the agents can achieve better collaboration. The improved algorithm enhances the collaboration and increases the sum of reward values obtained by the multiagent system. To evaluate the performance of the algorithm, we use the MAJPPO algorithm to complete the task of multi-UAV formation and the crossing of multiple-obstacle environments. To simplify the control complexity of the UAV, we use the six-degree of freedom and 12-state equations of the dynamics model of the UAV with an attitude control loop. The experimental results show that the MAJPPO algorithm has better performance and better environmental adaptability.


Sign in / Sign up

Export Citation Format

Share Document