Smart Team Play: Utility of Population-Based Training for Cooperative AI in Overcooked

Bachelor Thesis (2022)
Author(s)

J.M. Moreira-Kanaley (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Robert Loftin – Mentor (TU Delft - Interactive Intelligence)

F.A. Oliehoek – Mentor (TU Delft - Interactive Intelligence)

Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Janaína Moreira-Kanaley
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Janaína Moreira-Kanaley
Graduation Date
27-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In an ad-hoc teamwork environment, artificial intelligence agents have the potential to take on supportive roles and complete tasks in collaboration with human players. The following paper investigates the use of employing population-based training (PBT) for reinforcement learning agents in the multi-player game Overcooked. In addition to this, the research examines whether the incorporation of highly mutated agents, which serve to introduce noise into the initial population, could enhance the final performance of PBT. As the method used to answer the previous inquiries, the learning curve of a selected PBT agent is first evaluated and its final performance with a human proxy then examined within different layouts of the game. Following this method, it was concluded that PBT, and other self-play agents, have the tendency to drastically underperform against the human proxy and agents that are trained based on human data. Furthermore, while incorporating the mutated agents increased sample efficiency in layouts with low risk of collisions, it had negligible effect on the final performance of PBT with the human proxy.

Files

Research_Paper_Final.pdf
(pdf | 1.05 Mb)
License info not available