Scheduling a Flexible Manufacturing System

None, None

Scheduling a Flexible Manufacturing System

A reinforcement learning based approach

Master Thesis (2023)

Author(s)

C.H. Pennings (TU Delft - Mechanical Engineering)

Contributor(s)

T. Keviczky – Mentor (TU Delft - Team Tamas Keviczky)

N. Yorke-Smith – Graduation committee member (TU Delft - Algorithmics)

Faculty

Mechanical Engineering

Reinforcement Learning Scheduling MCTS Flexible manufacturing system FJSPT AlphaZero

To reference this document use

https://resolver.tudelft.nl/uuid:e269c77e-f2b6-4b72-a818-66a85ce406a4

More Info

expand_more

Publication Year

2023

Language

English

Graduation Date

12-06-2023

Awarding Institution

Delft University of Technology

Programme

Mechanical Engineering, Systems and Control

Faculty

Mechanical Engineering

Downloads counter

215

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A flexible manufacturing system (FMS) has advantages over traditional manufacturing systems due to its ability to deal with unpredicted circumstances such as changes in demand or component breakdowns by re-routing. However, this flexibility increases the complexity of controlling such a system. Traditionally, the system model is simplified to reduce the solution space by removing intra-machine transportation complexities. This thesis explores how these complexities can be kept and accounted for during scheduling. A scheme is used where short term schedules are continuously calculated to determine the optimal schedule over the next timeframe. The flexible job shop scheduling problem with transport (FJSPT) is used to represent the complexities of the FMS. To calculate part-schedules repeatedly a fast constructive search method is needed, the AlphaZero framework is identified as a fitting candidate. The FJSPT is translated into the reinforcement learning framework using a reduced action space, a graph neural network based state representation and normalized reward function. A naive normalization approach for the reward function is found to introduce problems in the value function sensitivity, while other adaptive method show fundamental flaws. A novel normalization method is introduced using min-max adaptive normalization and suboptimal node inclusion to improve value function training data. Implementing and training the algorithm shows the method performs poorly in comparison to metaheuristic based algorithms for the FJSPT problem. The value function is not able to converge to training data, while this is critical for the self-improvement training of the algorithm. Future work should focus on developing a normalized value function that is sensitive to solution quality and is able to converge. Despite the challenges, the work provides insights into the complexities of implementing AlphaZero for combinatorial optimization.

Files

Thesis_Casper_Pennings.pdf

(pdf | 3.54 Mb)

License info not available