Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma

Bachelor Thesis (2020)
Author(s)

E.A. van der Toorn (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

J.G.H. Cockx – Graduation committee member (TU Delft - Programming Languages)

Canmanie T. Ponnambalam – Coach (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2020 Eric van der Toorn
More Info
expand_more
Publication Year
2020
Language
English
Copyright
© 2020 Eric van der Toorn
Graduation Date
22-06-2020
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A recent advancement in Reinforcement Learning is the capability of modelling opponents. In this work, we are interested in going back to basics and testing this capability within the Iterated Prisoner's Dilemma, a simple method for modelling multi agent systems. Using the self modelling advantage actor critic model, we set up a single agent model that encodes its opponents, without requiring the opponents' actions directly. To verify that this technique is indeed capable of modelling opponents its capacity of encoding opponents is tested and the trained model is tested against several popular strategies. The embedding is found to not have a positive effect on the reward, only increasing the randomness of the model.

Files

Final_thesis.pdf
(pdf | 0.813 Mb)
License info not available