Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma

None, None

Knowing one’s opponents: Self Modeling Advantage Actor Critic for the Iterated Prisoner’s Dilemma

Bachelor Thesis (2020)

Author(s)

E.A. van der Toorn (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

N. Yorke-Smith – Mentor (TU Delft - Algorithmics)

J.G.H. Cockx – Graduation committee member (TU Delft - Programming Languages)

Canmanie T. Ponnambalam – Coach (TU Delft - Algorithmics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Machine learning Reinforcement Learning (RL) Variational Autoencoder Opponent Modelling Iterated prisoner's dilemma Advantage Actor Critic

To reference this document use:

https://resolver.tudelft.nl/uuid:75b0fdd7-22f4-447c-90b7-ce3fdce34e58

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

22-06-2020

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A recent advancement in Reinforcement Learning is the capability of modelling opponents. In this work, we are interested in going back to basics and testing this capability within the Iterated Prisoner's Dilemma, a simple method for modelling multi agent systems. Using the self modelling advantage actor critic model, we set up a single agent model that encodes its opponents, without requiring the opponents' actions directly. To verify that this technique is indeed capable of modelling opponents its capacity of encoding opponents is tested and the trained model is tested against several popular strategies. The embedding is found to not have a positive effect on the reward, only increasing the randomness of the model.

Files

Final_thesis.pdf

(pdf | 0.813 Mb)

License info not available