Mitigating dead- and live-locks in Machine Learning-based Multi-Agent Path Finding

None, None

Mitigating dead- and live-locks in Machine Learning-based Multi-Agent Path Finding

Master Thesis (2025)

Author(s)

K.J. de Klerk (TU Delft - Mechanical Engineering)

Contributor(s)

Christian Pek – Graduation committee member (TU Delft - Robot Dynamics)

Tom van Groeningen – Mentor (Sioux Technologies)

Faculty

Mechanical Engineering

Reinforcement Learning Machine Learning Path Finding MAPF

To reference this document use:

https://resolver.tudelft.nl/uuid:0225bf9e-5736-4ede-b38b-fa7139da437e

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Programme

Mechanical Engineering | Vehicle Engineering | Cognitive Robotics

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

When a robot is navigating through an environment crowded by obstacles and other robots, it can get stuck while trying to go from point A to point B. If they get stuck in place, this is often called a deadlock. If they rather get stuck in a pattern of repetitive movements, this is called a livelock. Those locking situations are detrimental to the efficiency of the robot as it loses time and possibly energy. Locking situations should therefore be avoided as much as possible or be recovered from as quickly as possible. Some planning algorithms are more prone to locking situations than others, in particular, decentralised learning-based planners such as DCC, PRIMAL and MAPPER. This is caused by their lack of a mech- anism that explicitly coordinates different agents and their limited perception of the state of the envi- ronment. It would be beneficial if the tendency to get dead- or livelocked could be curbed with simple techniques that can be applied to any planner that works in largely the same manner. Hence, this research proposes and evaluates four methods: punishing actions that lead to an agent getting locked during training (Lock Punishment or LP), training on locking situations encountered during training to teach the model to recover from such a situation (Targeted Training or TT), teach an already trained model to recover from locking situation (Targeted Fine-tuning or TF), and placing obstacles that are only visible to an agent that has just entered a locking situation to encourage it to move away from that location (Phantom Obstacles or PO). All of these methods are evaluated on the same benchmark to see which singular or combination of techniques performs best. In the end, the most effective methods turned out to be PO and a combination of TT and PO. PO demonstrates an increase of 1.6% in its mean success-rate and a decrease of 3% in its computation times over the baseline, and TT+PO’s mean Sum of Costs, Sum of Fuel, and Makespan are lower than the baseline by 0.8%, 1.2% and 2.1% respectively. Although LP+PO has relatively poor performance in most metrics, it reduces the number of locks the baseline encounters by 66%, thus achieving the lowest number of locks.

Files

Msc_Thesis-Kieran-de-Klerk.pdf

(pdf | 0 Mb)

License info not available

File under embargo until 01-08-2025