Search results | TU Delft Repositories

document

Think Too Fast Nor Too Slow: The Computational Trade-off Between Planning And Reinforcement Learning

Moerland, T.M. (author), Deichler, Anna (author), Baldi, S. (author), Broekens, D.J. (author), Jonker, C.M. (author)

Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a learning loop. However, the combination of planning and learning...

book chapter 2020

document

Joy, Distress, Hope, and Fear in Reinforcement Learning (Extended Abstract)

Jacobs, E.J. (author), Broekens, J. (author), Jonker, C.M. (author)

In this paper we present a mapping between joy, distress, hope and fear, and Reinforcement Learning primitives. Joy / distress is a signal that is derived from the RL update signal, while hope/fear is derived from the utility of the current state. Agent-based simulation experiments replicate psychological and behavioral dynamics of emotion...

conference paper 2014

document

Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning

Moerland, T.M. (author), Broekens, D.J. (author), Jonker, C.M. (author)

In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task environments. However, discriminative function approximators have...

conference paper 2017

document

Efficient exploration with Double Uncertain Value Networks

Moerland, T.M. (author), Broekens, D.J. (author), Jonker, C.M. (author)

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns ...

conference paper 2017

document

Designing interfaces for explicit preference elicitation: A user-centered investigation of preference representation and elicitation process

Pommeranz, A. (author), Broekens, J. (author), Wiggers, P. (author), Brinkman, W.P. (author), Jonker, C.M. (author)

Two problemsmay arisewhen an intelligent (recommender) system elicits users’ preferences. First, theremay be amismatch between the quantitative preference representations in most preference models and the users’ mental preference models. Giving exact numbers, e.g., such as “I like 30 days of vacation 2.5 times better than 28 days” is difficult...

journal article 2012

document

A Unifying Framework for Reinforcement Learning and Planning

Moerland, Thomas M. (author), Broekens, D.J. (author), Plaat, Aske (author), Jonker, C.M. (author)

Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem,...

journal article 2022

document

Emotion in reinforcement learning agents and robots: A survey

Moerland, T.M. (author), Broekens, D.J. (author), Jonker, C.M. (author)

This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation and action selection. Therefore, computational emotion models are...

journal article 2018

document

Model-based Reinforcement Learning: A Survey

Moerland, T.M. (author), Broekens, D.J. (author), Plaat, Aske (author), Jonker, C.M. (author)

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This survey is an integration of both fields, better known as model-based reinforcement learning. Model-based RL...

review 2023