Exploring Attention Mechanisms in Transformers for Data-Efficient Model-Based Reinforcement Learning

Bachelor Thesis (2025)
Author(s)

D. De Dios Allegue (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)

J. He – Mentor (TU Delft - Sequential Decision Making)

Michael Weinmann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A key advancement in model-based Reinforcement Learning (RL) stems from Transformer
based world models, which allow agents to plan effectively by learning an internal represen
tation of the environment. However, causal self-attention in Transformers can be computa
tionally redundant when most relevant information lies in only a few recent steps, making it
inefficient for environments with predominantly short-term memory dependencies. This pa
per investigates integrating alternative attention mechanisms into world models to address
these limitations. We embed inductive biases via local attention and Gaussian adaptive
attention, aiming to guide the model’s focus towards more relevant elements of the ob
servation history. We evaluate these modified architectures in four environments on the
Atari 100k benchmark under partially observable conditions. Our results show that, in
environments where relevant information is contained within a specific recent window of
observations (i.e. a short-term memory dependency), tuning local or Gaussian adaptive
attention to that window lets them significantly outperform causal attention within a lim
ited number of interactions. In Pong, the best performing Gaussian attention model raised
the mean return from–14.53 to–6.86, representing roughly a 53% improvement over the
baseline. The effectiveness of these mechanisms varies with the complexity and dynamism
of the influential variables within an environment, highlighting the importance of appro
priate prior selection and flexibility. This work highlights that leveraging influence-based
principles through inductive biases can lead to more data-efficient attention mechanisms for
world models, particularly when agents must learn from limited environment interactions
in diverse RL settings.

Files

RP_Paper_Daniel_De_Dios.pdf
(pdf | 1.87 Mb)
License info not available