Smart Team Play: Utility of Population-Based Training for Cooperative AI in Overcooked

None, None

Smart Team Play: Utility of Population-Based Training for Cooperative AI in Overcooked

Bachelor Thesis (2022)

Author(s)

J.M. Moreira-Kanaley (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

R.T. Loftin – Mentor (TU Delft - Interactive Intelligence)

FA Oliehoek – Mentor (TU Delft - Interactive Intelligence)

Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Artifical Intelligence Collaborative AI Cooperation Interactive Intelligence Games Teamwork

To reference this document use:

https://resolver.tudelft.nl/uuid:396f42bd-b50e-47c5-8f41-2b2e6f9b7ab0

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

27-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

In an ad-hoc teamwork environment, artificial intelligence agents have the potential to take on supportive roles and complete tasks in collaboration with human players. The following paper investigates the use of employing population-based training (PBT) for reinforcement learning agents in the multi-player game Overcooked. In addition to this, the research examines whether the incorporation of highly mutated agents, which serve to introduce noise into the initial population, could enhance the final performance of PBT. As the method used to answer the previous inquiries, the learning curve of a selected PBT agent is first evaluated and its final performance with a human proxy then examined within different layouts of the game. Following this method, it was concluded that PBT, and other self-play agents, have the tendency to drastically underperform against the human proxy and agents that are trained based on human data. Furthermore, while incorporating the mutated agents increased sample efficiency in layouts with low risk of collisions, it had negligible effect on the final performance of PBT with the human proxy.

Files

Research_Paper_Final.pdf

(pdf | 1.05 Mb)

License info not available