Print Email Facebook Twitter Scheduling a Flexible Manufacturing System Title Scheduling a Flexible Manufacturing System: A reinforcement learning based approach Author Pennings, Casper (TU Delft Mechanical, Maritime and Materials Engineering) Contributor Keviczky, T. (mentor) Yorke-Smith, N. (graduation committee) Degree granting institution Delft University of Technology Programme Mechanical Engineering | Systems and Control Date 2023-06-12 Abstract A flexible manufacturing system (FMS) has advantages over traditional manufacturing systems due to its ability to deal with unpredicted circumstances such as changes in demand or component breakdowns by re-routing. However, this flexibility increases the complexity of controlling such a system. Traditionally, the system model is simplified to reduce the solution space by removing intra-machine transportation complexities. This thesis explores how these complexities can be kept and accounted for during scheduling. A scheme is used where short term schedules are continuously calculated to determine the optimal schedule over the next timeframe. The flexible job shop scheduling problem with transport (FJSPT) is used to represent the complexities of the FMS. To calculate part-schedules repeatedly a fast constructive search method is needed, the AlphaZero framework is identified as a fitting candidate. The FJSPT is translated into the reinforcement learning framework using a reduced action space, a graph neural network based state representation and normalized reward function. A naive normalization approach for the reward function is found to introduce problems in the value function sensitivity, while other adaptive method show fundamental flaws. A novel normalization method is introduced using min-max adaptive normalization and suboptimal node inclusion to improve value function training data. Implementing and training the algorithm shows the method performs poorly in comparison to metaheuristic based algorithms for the FJSPT problem. The value function is not able to converge to training data, while this is critical for the self-improvement training of the algorithm. Future work should focus on developing a normalized value function that is sensitive to solution quality and is able to converge. Despite the challenges, the work provides insights into the complexities of implementing AlphaZero for combinatorial optimization. Subject SchedulingReinforcement LearningFlexible manufacturing systemFJSPTAlphaZeroMCTS To reference this document use: http://resolver.tudelft.nl/uuid:e269c77e-f2b6-4b72-a818-66a85ce406a4 Part of collection Student theses Document type master thesis Rights © 2023 Casper Pennings Files PDF Thesis_Casper_Pennings.pdf 3.54 MB Close viewer /islandora/object/uuid:e269c77e-f2b6-4b72-a818-66a85ce406a4/datastream/OBJ/view