Cooperative AI for Overcooked

None, None

Cooperative AI for Overcooked

Multi-Agent RL with Population-Based Training

Bachelor Thesis (2023)

Author(s)

I.N. Nestorov (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Robert Loftin – Mentor (TU Delft - Interactive Intelligence)

FA Oliehoek – Mentor (TU Delft - Interactive Intelligence)

K Hildebrandt – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Artifical Intelligence Collaborative AI Games

To reference this document use:

https://resolver.tudelft.nl/uuid:e8f04bca-cccd-4ba5-b5cf-23c52c795200

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

28-06-2023

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In ad-hoc cooperative environments, the usage of artificial intelligence to take supportive roles and work in collaboration with humans has proven to be of great benefit. The objective of this research is to evaluate the use of population-based training for reinforcement learning agents in a simplified version of the multiplayer game - Overcooked. The method used to answer that question involves evaluating the performance of the agents when paired with a human proxy and their learning curves on different layouts. Based on the employed method, it was concluded that both PBT and other self-play agents display notable underperformance when compared to human proxies and agents trained using human data. Moreover, while the inclusion of mutated agents enhanced sample efficiency in layouts with minimal collision risks, its effect on the final performance of PBT in those layouts was negligible. However, this approach managed to improve performance in layouts where collisions were the primary limiting factor.

Files

Final_Paper_Ivan_Nestorov.pdf

(pdf | 0.727 Mb)

License info not available