A key advancement in model-based Reinforcement Learning (RL) stems from Transformer
based world models, which allow agents to plan effectively by learning an internal represen
tation of the environment. However, causal self-attention in Transformers can be computa
tio
...
A key advancement in model-based Reinforcement Learning (RL) stems from Transformer
based world models, which allow agents to plan effectively by learning an internal represen
tation of the environment. However, causal self-attention in Transformers can be computa
tionally redundant when most relevant information lies in only a few recent steps, making it
inefficient for environments with predominantly short-term memory dependencies. This pa
per investigates integrating alternative attention mechanisms into world models to address
these limitations. We embed inductive biases via local attention and Gaussian adaptive
attention, aiming to guide the model’s focus towards more relevant elements of the ob
servation history. We evaluate these modified architectures in four environments on the
Atari 100k benchmark under partially observable conditions. Our results show that, in
environments where relevant information is contained within a specific recent window of
observations (i.e. a short-term memory dependency), tuning local or Gaussian adaptive
attention to that window lets them significantly outperform causal attention within a lim
ited number of interactions. In Pong, the best performing Gaussian attention model raised
the mean return from–14.53 to–6.86, representing roughly a 53% improvement over the
baseline. The effectiveness of these mechanisms varies with the complexity and dynamism
of the influential variables within an environment, highlighting the importance of appro
priate prior selection and flexibility. This work highlights that leveraging influence-based
principles through inductive biases can lead to more data-efficient attention mechanisms for
world models, particularly when agents must learn from limited environment interactions
in diverse RL settings.