Training a Negotiating Agent through Self-Play

Bachelor Thesis (2022)
Author(s)

R. Jurševskis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

B.M. Renting – Mentor (TU Delft - Interactive Intelligence)

Pradeep Kumar Murukannaiah – Mentor (TU Delft - Interactive Intelligence)

X. Zhang – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Renāts Jurševskis
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Renāts Jurševskis
Graduation Date
23-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Related content

Repository including the implementation and raw results obtained during the research

https://github.com/brenting/negotiation_PPO/tree/testing-self-play
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Recent developments in applying reinforcement learning to cooperative environments, like negotiation, have brought forward an important question: how well can a negotiating agent be trained through self-play? Previous research has seen successful application of self-play to other settings, like the games of chess and Go. This paper explores the usage of self-play within the training of a negotiating agent and determines if it is possible to successfully train an agent purely through self-play. The results of the experimentation show that a training stage using self-play can match or even exceed an approach using a set of training opponents. By using multiple self-play opponents, the average utility can be further improved by introducing more variance during training. In addition, using a combination of both self-play and training opponents leads to a hybrid approach that performs better than either of the two techniques separately.

Files

License info not available