Exploring Attention Mechanisms in Transformers for Data-Efficient Model-Based Reinforcement Learning

Bachelor Thesis (2025)
Author(s)

D. De Dios Allegue (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F.A. Oliehoek – Mentor (TU Delft - Sequential Decision Making)

J. He – Mentor (TU Delft - Sequential Decision Making)

M. Weinmann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
171
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A key advancement in model-based Reinforcement Learning (RL) stems from Transformer based world models, which allow agents to plan effectively by learning an internal represen tation of the environment. However, causal self-attention in Transformers can be computa tionally redundant when most relevant information lies in only a few recent steps, making it inefficient for environments with predominantly short-term memory dependencies. This pa per investigates integrating alternative attention mechanisms into world models to address these limitations. We embed inductive biases via local attention and Gaussian adaptive attention, aiming to guide the model’s focus towards more relevant elements of the ob servation history. We evaluate these modified architectures in four environments on the Atari 100k benchmark under partially observable conditions. Our results show that, in environments where relevant information is contained within a specific recent window of observations (i.e. a short-term memory dependency), tuning local or Gaussian adaptive attention to that window lets them significantly outperform causal attention within a lim ited number of interactions. In Pong, the best performing Gaussian attention model raised the mean return from–14.53 to–6.86, representing roughly a 53% improvement over the baseline. The effectiveness of these mechanisms varies with the complexity and dynamism of the influential variables within an environment, highlighting the importance of appro priate prior selection and flexibility. This work highlights that leveraging influence-based principles through inductive biases can lead to more data-efficient attention mechanisms for world models, particularly when agents must learn from limited environment interactions in diverse RL settings.

Files

RP_Paper_Daniel_De_Dios.pdf
(pdf | 1.87 Mb)
License info not available