Mitigating dead- and live-locks in Machine Learning-based Multi-Agent Path Finding

Master Thesis (2025)
Author(s)

K.J. de Klerk (TU Delft - Mechanical Engineering)

Contributor(s)

Christian Pek – Graduation committee member (TU Delft - Robot Dynamics)

Tom van Groeningen – Mentor (Sioux Technologies)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Programme
Mechanical Engineering | Vehicle Engineering | Cognitive Robotics
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

When a robot is navigating through an environment crowded by obstacles and other robots, it can get stuck while trying to go from point A to point B. If they get stuck in place, this is often called a deadlock. If they rather get stuck in a pattern of repetitive movements, this is called a livelock. Those locking situations are detrimental to the efficiency of the robot as it loses time and possibly energy. Locking situations should therefore be avoided as much as possible or be recovered from as quickly as possible. Some planning algorithms are more prone to locking situations than others, in particular, decentralised learning-based planners such as DCC, PRIMAL and MAPPER. This is caused by their lack of a mech- anism that explicitly coordinates different agents and their limited perception of the state of the envi- ronment. It would be beneficial if the tendency to get dead- or livelocked could be curbed with simple techniques that can be applied to any planner that works in largely the same manner. Hence, this research proposes and evaluates four methods: punishing actions that lead to an agent getting locked during training (Lock Punishment or LP), training on locking situations encountered during training to teach the model to recover from such a situation (Targeted Training or TT), teach an already trained model to recover from locking situation (Targeted Fine-tuning or TF), and placing obstacles that are only visible to an agent that has just entered a locking situation to encourage it to move away from that location (Phantom Obstacles or PO). All of these methods are evaluated on the same benchmark to see which singular or combination of techniques performs best. In the end, the most effective methods turned out to be PO and a combination of TT and PO. PO demonstrates an increase of 1.6% in its mean success-rate and a decrease of 3% in its computation times over the baseline, and TT+PO’s mean Sum of Costs, Sum of Fuel, and Makespan are lower than the baseline by 0.8%, 1.2% and 2.1% respectively. Although LP+PO has relatively poor performance in most metrics, it reduces the number of locks the baseline encounters by 66%, thus achieving the lowest number of locks.

Files

License info not available
warning

File under embargo until 01-08-2025