See Clearly, Act Intelligently: Transformers in Transparent Environments

None, None

See Clearly, Act Intelligently: Transformers in Transparent Environments

Bachelor Thesis (2024)

Author(s)

O. Elamin (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J. He – Mentor (TU Delft - Sequential Decision Making)

Frans Oliehoek – Mentor (TU Delft - Sequential Decision Making)

Mathijs De Weerdt – Graduation committee member (TU Delft - Algorithmics)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:3023f9b4-f887-4545-a20e-de064401dfbb

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Traditionally, Recurrent Neural Networks (RNNs) are used to predict the sequential dynamics of the environment. With the advancement and breakthroughs of Transformer models, there has been demonstrated improvement in the performance & sample efficiency of Transformers as world models. The focus has been on partially-observable environments where their capabilities can be maximally utilised. In this paper, we sought to investigate the conditions under which transformers outperform RNNs given a fully observable environment where states obey the Markov property. This provides insight into transformers' generalisation and predictive capabilities. Specifically, our experiments explored the impact of model complexity and the size of the dataset. We observed that transformers did not outperform our baseline implementation when given up to 7000 episodes of trajectory data. It was also observed that having shorter sequence lengths had a negligible impact on the performance of the model, leading to our recommendation of avoiding using transformers in these fully observable environments.

Files

Research_Project_18_.pdf

(pdf | 0.588 Mb)

License info not available