Reinforcement Learning for the Discrete Dynamic Berth Allocation Problem

None, None

Reinforcement Learning for the Discrete Dynamic Berth Allocation Problem

Evaluating a trained Discrete Dynamic Berth Allocation model on Berth breakdowns

Bachelor Thesis (2026)

Author(s)

T. Kuklys (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

C. March Moya – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Yorke-Smith – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning Binary and partial berth breakdown Breakdown severity Dynamic berth allocation problem

To reference this document use

https://resolver.tudelft.nl/uuid:e7565970-bd38-41b4-91e2-c066cbd34590

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

21-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

6

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The problem of scheduling vessels in a port as they arrive one-by-one is known as the Dynamic Berth Allocation Problem and it is NP-hard. This paper analyses the influence of berth breakdowns on the scheduling and optimality of a trained Reinforcement Learning model by March Moya et al. for such a problem. Several breakdown parameters, including frequency, severity, dura- tion, and the probability of binary versus partial breakdowns, were examined independently and in combination with one another with respect to different scheduling heuristics. The breakdowns were dynamically injected into the model’s event loop so that it did not have knowledge of upcoming breakdowns. Each experimental configuration was evaluated using ten random seeds and the sample mean and standard deviation were computed. The results showed low variance between different seeds and configurations. Breakdown frequency was the main factor limiting the model’s perfor- mance, moving the performance from a 19.6% advantage to 6.4% in the most extreme cases when compared to the baseline heuristic of WTSP. The other parameters did not produce significant model degradation.

Files

Research_Paper_Tautvydas_Kukly... (pdf)

(pdf | 1.15 Mb)

License info not available