Reinforcement learning for order distribution in self-organizing logistics

None, None

Reinforcement learning for order distribution in self-organizing logistics

Master Thesis (2021)

Author(s)

Y.C. de Vries (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

Emir Demirovic – Mentor (TU Delft - Algorithmics)

Christian van Ommeren – Mentor (TNO)

Jan Pieter Paardekooper – Mentor (TNO)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Reinforcement Learning Machine Learning Attention Container Transport Deep Q-Learning Self-organizing logistics

To reference this document use:

https://resolver.tudelft.nl/uuid:4c3d19c3-1a49-4757-9ecc-357199955f1c

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

31-08-2021

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

With the increasing global demand for logistics, supply chains have grown a lot in volume over the last decades. To be able to operate effectively within the capacity constraints of the carriers, proper collaboration and optimization of order allocation is required. Van Berkel Logistics facilitates the transport of containers by trucks from sea terminals in Rotterdam to inland customers and back. This logistical planning problem is manually solved by planners on a daily basis. Within this research it is investigated to what extent reinforcement learning could be applied for solving this planning problem of moving containers in an automated way. A simulation environment was constructed which represents the container planning dynamics. It was made as accurate as reasonably possible with the help of historic data. Three reinforcement learning models, the OnePass, Iterative and Attention model, have been developed and tested for their ability to learn to choose proper orders such that the orders are as much on time as possible. A main challenge in constructing these models was to design them such that they could cope with a varying state and action space. In an experimental evaluation, it was found that the models are able to learn to make better decisions over time and eventually perform similar to the heuristic baseline tested out in terms of total lateness observed. In terms of driven distance and fraction on time orders, the OnePass and Iterative model were able to beat the heuristic choices. Overall, the Iterative model has shown the best performance and is able to learn scenarios as big as real-life scenarios van Berkel Logistics deals with. However, it also tends to be slower than the other models due to its iterative approach.

Files

MSc_Thesis_Yorick.pdf

(pdf | 6.42 Mb)

License info not available