Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma

More Info
expand_more

Abstract

A recent advancement in Reinforcement Learning is the capability of modelling opponents. In this work, we are interested in going back to basics and testing this capability within the Iterated Prisoner's Dilemma, a simple method for modelling multi agent systems. Using the self modelling advantage actor critic model, we set up a single agent model that encodes its opponents, without requiring the opponents' actions directly. To verify that this technique is indeed capable of modelling opponents its capacity of encoding opponents is tested and the trained model is tested against several popular strategies. The embedding is found to not have a positive effect on the reward, only increasing the randomness of the model.

Files