Exploring Attention Mechanisms in Transformers for Data-Efficient Model-Based Reinforcement Learning

None, None

Exploring Attention Mechanisms in Transformers for Data-Efficient Model-Based Reinforcement Learning

Bachelor Thesis (2025)

Author(s)

D. De Dios Allegue (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

FA Oliehoek – Mentor (TU Delft - Sequential Decision Making)

J. He – Mentor (TU Delft - Sequential Decision Making)

Michael Weinmann – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use:

https://resolver.tudelft.nl/uuid:b7ae3ab5-ed19-4015-9954-cb9bc9f500f1

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

A key advancement in model-based Reinforcement Learning (RL) stems from Transformer
based world models, which allow agents to plan effectively by learning an internal represen
tation of the environment. However, causal self-attention in Transformers can be computa
tionally redundant when most relevant information lies in only a few recent steps, making it
inefficient for environments with predominantly short-term memory dependencies. This pa
per investigates integrating alternative attention mechanisms into world models to address
these limitations. We embed inductive biases via local attention and Gaussian adaptive
attention, aiming to guide the model’s focus towards more relevant elements of the ob
servation history. We evaluate these modified architectures in four environments on the
Atari 100k benchmark under partially observable conditions. Our results show that, in
environments where relevant information is contained within a specific recent window of
observations (i.e. a short-term memory dependency), tuning local or Gaussian adaptive
attention to that window lets them significantly outperform causal attention within a lim
ited number of interactions. In Pong, the best performing Gaussian attention model raised
the mean return from–14.53 to–6.86, representing roughly a 53% improvement over the
baseline. The effectiveness of these mechanisms varies with the complexity and dynamism
of the influential variables within an environment, highlighting the importance of appro
priate prior selection and flexibility. This work highlights that leveraging influence-based
principles through inductive biases can lead to more data-efficient attention mechanisms for
world models, particularly when agents must learn from limited environment interactions
in diverse RL settings.

Files

RP_Paper_Daniel_De_Dios.pdf

(pdf | 1.87 Mb)

License info not available