T.M. Moerland | TU Delft Repository

Model-based Reinforcement Learning

A Survey

Review (2023) - Thomas M. Moerland (author) , D.J. Broekens (author) , Aske Plaat (author) , CM Jonker (author)

Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is an important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This survey is an integration of both fields, bett ...

The Intersection of Planning and Learning

Doctoral thesis (2021) - T.M. Moerland (author)

Intelligent sequential decision making is a key challenge in artificial intelligence. The problem, commonly formalized as a Markov Decision Process, is studied in two different research communities: planning and reinforcement learning. Departing from a fundamentally different ass ...

Intelligent sequential decision making is a key challenge in artificial intelligence. The problem, commonly formalized as a Markov Decision Process, is studied in two different research communities: planning and reinforcement learning. Departing from a fundamentally different assumption about the type of access to the environment, both research fields have developed their own solution approaches and conventions. The combination of both fields, known as model-based reinforcement learning, has recently shown state-of-the-art results, for example defeating human experts in classic board games like Chess and Go. Nevertheless, literature lacks an integrated view on 1) the similarities between planning and learning, and 2) the possible combinations of both. This dissertation aims to fill this gap. The first half of the book presents a conceptual answer to both questions. We first present a framework that disentangles the common algorithmic space of both fields, showing that they essentially face the same algorithmic design decisions. Moreover, we also present an overview of the different ways in which planning and learning can be combined in one algorithm. The second half of the dissertation provides experimental illustration of these ideas. We present several new combinations of planning and learning, such as a flexible method to learn stochastic dynamics models with neural networks, an extension of a successful planning-learning algorithm (AlphaZero) to deal with continuous action spaces, and a study of the empirical trade-off between planning and learning. Finally, we also illustrate the commonalities between both fields, by designing a new algorithm in one field based on inspiration from the other field. We conclude the thesis with an outlook for the planning-learning field as a whole. Altogether, the dissertation provides a broad theoretical and empirical view on the combination of planning and learning, which promises to be an important frontier in artificial intelligence research in the coming years.

Think Too Fast Nor Too Slow

The Computational Trade-off Between Planning And Reinforcement Learning

Book chapter (2020) - Thomas M. Moerland (author) , Anna Deichler (author) , S Baldi (author) , DJ Broekens (author) , Catholijn Jonker (author)

Planning and reinforcement learning are two key approaches to sequential decision making. Multi-step approximate real-time dynamic programming, a recently successful algorithm class of which AlphaZero [Silver et al., 2018] is an example, combines both by nesting planning within a ...

A Framework for Reinforcement Learning and Planning

Extended Abstract

Book chapter (2020) - T.M. Moerland (author) , D.J. Broekens (author) , C.M. Jonker (author)

Sequential decision making, commonly formalized as Markov Decision Process optimiza-tion, is a key challenge in artificial intelligence. Two successful approaches to MDP opti-mization are planning and reinforcement learning. Both research fields largely have their own research ...

Alpha zero in continuous action space

Extended abstract.

Abstract (2018) - T.M. Moerland (author) , Joost Broekens (author) , Aske Plaat (author) , Catholijn M. Jonker (author)

RRT-CoLearn

Towards kinodynamic planning without numerical trajectory optimization

Journal article (2018) - Wouter J. Wolfslag (author) , M. Bharatheesha (author) , T.M. Moerland (author) , Martijn Wisse (author)

Sampling-based kinodynamic planners, such as Rapidly-exploring Random Trees (RRTs), pose two fundamental challenges: computing a reliable (pseudo-)metric for the distance between two randomly sampled nodes, and computing a steering input to connect the nodes. The core of these ch ...

Emotion in reinforcement learning agents and robots

A survey

Journal article (2018) - T.M. Moerland (author) , DJ Broekens (author) , Catholijn Jonker (author)

This article provides the first survey of computational models of emotion in reinforcement learning (RL) agents. The survey focuses on agent/robot emotions, and mostly ignores human user emotions. Emotions are recognized as functional in decision-making by influencing motivation ...

Efficient exploration with Double Uncertain Value Networks

Conference paper (2017) - T.M. Moerland (author) , DJ Broekens (author) , Catholijn Jonker (author)

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), w ...

Learning Multimodal Transition Dynamics for Model-Based Reinforcement Learning

Conference paper (2017) - T.M. Moerland (author) , DJ Broekens (author) , Catholijn Jonker (author)

In this paper we study how to learn stochastic, multimodal transition dynamics in reinforcement learning (RL) tasks. We focus on evaluating transition function estimation, while we defer planning over this model to future work. Stochasticity is a fundamental property of many task ...

Fear and hope emerge from anticipation in model-based reinforcement learning

Conference paper (2016) - T.M. Moerland (author) , DJ Broekens (author) , Catholijn Jonker (author)

Social agents and robots will require both learning and emotional capabilities to successfully enter society. This paper connects both challenges, by studying models of emotion generation in sequential decision-making agents. Previous work in this field has focussed on model-free ...

Knowing what you don’t know

Novelty detection for action recognition in personal robots

Conference paper (2016) - T.M. Moerland (author) , A.C. Alargarsamy Balasubramanian (author) , M Rudinac (author) , Pieter P. Jonker (author)

Novelty detection is essential for personal robots to continuously learn and adapt in open environments. This paper specifically studies novelty detection in the context of action recognition. To detect unknown (novel) human action sequences we propose a new method called backgro ...