Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma
E.A. van der Toorn (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Yorke-Smith – Mentor (TU Delft - Algorithmics)
J.G.H. Cockx – Graduation committee member (TU Delft - Programming Languages)
Canmanie T. Ponnambalam – Coach (TU Delft - Algorithmics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
A recent advancement in Reinforcement Learning is the capability of modelling opponents. In this work, we are interested in going back to basics and testing this capability within the Iterated Prisoner's Dilemma, a simple method for modelling multi agent systems. Using the self modelling advantage actor critic model, we set up a single agent model that encodes its opponents, without requiring the opponents' actions directly. To verify that this technique is indeed capable of modelling opponents its capacity of encoding opponents is tested and the trained model is tested against several popular strategies. The embedding is found to not have a positive effect on the reward, only increasing the randomness of the model.