Training a Negotiating Agent through Self-Play
R. Jurševskis (TU Delft - Electrical Engineering, Mathematics and Computer Science)
B.M. Renting – Mentor (TU Delft - Interactive Intelligence)
Pradeep Kumar Murukannaiah – Mentor (TU Delft - Interactive Intelligence)
X. Zhang – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Repository including the implementation and raw results obtained during the research
https://github.com/brenting/negotiation_PPO/tree/testing-self-playOther than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Recent developments in applying reinforcement learning to cooperative environments, like negotiation, have brought forward an important question: how well can a negotiating agent be trained through self-play? Previous research has seen successful application of self-play to other settings, like the games of chess and Go. This paper explores the usage of self-play within the training of a negotiating agent and determines if it is possible to successfully train an agent purely through self-play. The results of the experimentation show that a training stage using self-play can match or even exceed an approach using a set of training opponents. By using multiple self-play opponents, the average utility can be further improved by introducing more variance during training. In addition, using a combination of both self-play and training opponents leads to a hybrid approach that performs better than either of the two techniques separately.