Scheduling a Flexible Manufacturing System: A reinforcement learning based approach

Pennings, Casper

Scheduling a Flexible Manufacturing System

Title

Scheduling a Flexible Manufacturing System: A reinforcement learning based approach

Author

Pennings, Casper (TU Delft Mechanical, Maritime and Materials Engineering)

Contributor

Keviczky, T. (mentor)
Yorke-Smith, N. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Mechanical Engineering | Systems and Control

Date

2023-06-12

Abstract

A flexible manufacturing system (FMS) has advantages over traditional manufacturing systems due to its ability to deal with unpredicted circumstances such as changes in demand or component breakdowns by re-routing. However, this flexibility increases the complexity of controlling such a system. Traditionally, the system model is simplified to reduce the solution space by removing intra-machine transportation complexities. This thesis explores how these complexities can be kept and accounted for during scheduling. A scheme is used where short term schedules are continuously calculated to determine the optimal schedule over the next timeframe. The flexible job shop scheduling problem with transport (FJSPT) is used to represent the complexities of the FMS. To calculate part-schedules repeatedly a fast constructive search method is needed, the AlphaZero framework is identified as a fitting candidate. The FJSPT is translated into the reinforcement learning framework using a reduced action space, a graph neural network based state representation and normalized reward function. A naive normalization approach for the reward function is found to introduce problems in the value function sensitivity, while other adaptive method show fundamental flaws. A novel normalization method is introduced using min-max adaptive normalization and suboptimal node inclusion to improve value function training data. Implementing and training the algorithm shows the method performs poorly in comparison to metaheuristic based algorithms for the FJSPT problem. The value function is not able to converge to training data, while this is critical for the self-improvement training of the algorithm. Future work should focus on developing a normalized value function that is sensitive to solution quality and is able to converge. Despite the challenges, the work provides insights into the complexities of implementing AlphaZero for combinatorial optimization.

Subject

Scheduling
Reinforcement Learning
Flexible manufacturing system
FJSPT
AlphaZero
MCTS

To reference this document use:

http://resolver.tudelft.nl/uuid:e269c77e-f2b6-4b72-a818-66a85ce406a4

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Thesis_Casper_Pennings.pdf

3.54 MB

Close viewer