Optimizing a Robot Fleet Scheduling Model and Floorplan using Max-Plus Linear Algebra and Deep Q-Learning

Boelen, E.

Optimizing a Robot Fleet Scheduling Model and Floorplan using Max-Plus Linear Algebra and Deep Q-Learning

Master thesis (2024)

Authors

E. Boelen Mechanical Engineering

Contributors

Ton Van Den Boom Team Ton van den Boom (graduation committee member)

Lucy Smeets Prime Vision (mentor)

Mart Ruijs Prime Vision (graduation committee member)

Azita Dabiri Team Azita Dabiri (graduation committee member)

Faculty

Mechanical Engineering

Reinforcement Learning Discrete event systems Max-plus algebra Deep Q-learning

To reference this document use:

http://resolver.tudelft.nl/uuid:27b9a079-206e-47aa-aa7a-c9379a47acfe

More Info

expand_more

Published Date

30-08-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Mechanical Engineering

Abstract

Automation of machines is becoming increasingly widespread and advanced, of which an example is the use robots for Prime Vision, which sorts parcels for postal services. The coordination of scheduling a fleet of robots picking up and dropping off many parcels while avoiding collisions, within a limited space, following predefined routes in a floorplan, is a complex problem.

This logistical challenge can be effectively modelled using max-plus linear algebra to allow an optimization for the route scheduling as was previously done by L. Smeets. The goal of this research is to improve the existing scheduling model and use this to develop a reinforcement learning-based algorithm that determines the optimal floorplan for the parcel delivery robots. Two methods are applied to improve the existing scheduling model. Firstly, nodes where no decisions are made are identified and removed. Secondly, certain constraints are also removed to simplify the model.

The results of the scheduler are used to determine a key performance indicator to allow a reinforcement learning based algorithm to identify the optimal floorplan for the robots. The reinforcement learning algorithm employed a deep Q-learning approach, with the neural network trained using various action space approaches, tuned rewards and hyper-parameters. The greedy-epsilon method was applied to address the exploration vs. exploitation problem. While the scheduler improvements significantly enhanced its computational costs, the neural network did not converge, and the potential causes are thoroughly discussed.

Files

Master_Thesis_-_Emma_Boelen.pd... (pdf)

License info not available

File under embargo until 30-08-2026