See Clearly, Act Intelligently: Transformers in Transparent Environments

Bachelor Thesis (2024)
Author(s)

O. Elamin (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. He – Mentor (TU Delft - Sequential Decision Making)

Frans A Oliehoek – Mentor (TU Delft - Sequential Decision Making)

Mathijs M. De Weerdt – Graduation committee member (TU Delft - Algorithmics)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-06-2024
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Traditionally, Recurrent Neural Networks (RNNs) are used to predict the sequential dynamics of the environment. With the advancement and breakthroughs of Transformer models, there has been demonstrated improvement in the performance & sample efficiency of Transformers as world models. The focus has been on partially-observable environments where their capabilities can be maximally utilised. In this paper, we sought to investigate the conditions under which transformers outperform RNNs given a fully observable environment where states obey the Markov property. This provides insight into transformers' generalisation and predictive capabilities. Specifically, our experiments explored the impact of model complexity and the size of the dataset. We observed that transformers did not outperform our baseline implementation when given up to 7000 episodes of trajectory data. It was also observed that having shorter sequence lengths had a negligible impact on the performance of the model, leading to our recommendation of avoiding using transformers in these fully observable environments.

Files

Research_Project_18_.pdf
(pdf | 0.588 Mb)
License info not available