Fictional Co-Play for Human-Agent Collaboration

None, None

Fictional Co-Play for Human-Agent Collaboration

Evaluating state-of-the-art reinforcement learning technique for adaptability to human collaborators

Bachelor Thesis (2022)

Author(s)

N.A. Ordonez Cardenas (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Frans Oliehoek – Mentor (TU Delft - Interactive Intelligence)

R.T. Loftin – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Reinforcement Learning Overcooked AI Human-Agent Teamwork Human-ai collaboration Ad-hoc teamwork

To reference this document use:

https://resolver.tudelft.nl/uuid:ca39cc40-049a-42ce-ba6f-003e5c358351

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

27-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A longstanding problem in the area of reinforcement learning is human-agent col- laboration. As past research indicates that RL agents undergo a distributional shift when they start collaborating with human beings, the goal is to create agents that can adapt. We build upon research using the two-player Overcooked environment to repro- duce a simplified version of the Fictitious Co-Play algorithm in order to confirm past found improvements at a smaller scale of training and using Self-Play and Population- based trained algorithms as the baselines for comparison. We find that the agent on average slightly outperforms both baseline algorithms when evaluated using a human proxy. We also find high cross-seed variance in performance, indicating the potential for further hyperparameter tuning.

Files

Research_paper_nathan.pdf

(pdf | 0.48 Mb)

License info not available