Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma

Bachelor thesis (2020)

Authors

E.A. van der Toorn Electrical Engineering, Mathematics and Computer Science

Contributors

N. Yorke-Smith Algorithmics - (supervisor 1)

J.G.H. Cockx Programming Languages - (supervisor 2)

C.T. Ponnambalam Algorithmics - (coach)

Faculty

Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:75b0fdd7-22f4-447c-90b7-ce3fdce34e58

Published Date

22-06-2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

A recent advancement in Reinforcement Learning is the capability of modelling opponents. In this work, we are interested in going back to basics and testing this capability within the Iterated Prisoner's Dilemma, a simple method for modelling multi agent systems. Using the self modelling advantage actor critic model, we set up a single agent model that encodes its opponents, without requiring the opponents' actions directly. To verify that this technique is indeed capable of modelling opponents its capacity of encoding opponents is tested and the trained model is tested against several popular strategies. The embedding is found to not have a positive effect on the reward, only increasing the randomness of the model.

Files

Final_thesis.pdf

(.pdf | 0.813 Mb)