J.W. Böhmer | TU Delft Repository

Epistemic Monte Carlo Tree Search

Conference paper (2025) - Y. Oren (author) , Viliam Vadocz (author) , Matthijs T.J. Spaan (author) , Wendelin Böhmer (author)

The AlphaZero/MuZero (A/MZ) family of algorithms has achieved remarkable success across various challenging domains by integrating Monte Carlo Tree Search (MCTS) with learned models. Learned models introduce epistemic uncertainty, which is caused by learning from limited data and ...

To the Max

Reinventing Reward in Reinforcement Learning

Journal article (2024) - G. Veviurko (author) , J.W. Böhmer (author) , M. De Weerdt (author)

In reinforcement learning (RL), different reward functions can define the same optimal policy but result in drastically different learning performance. For some, the agent gets stuck with a suboptimal behavior, and for others, it solves the task efficiently. Choosing a good rewar ...

Value Improved Actor Critic Algorithms

Preprint (2024) - Yaniv Oren (author) , M.A. Zanger (author) , P.R. van der Vaart (author) , MTJ Spaan (author) , J.W. Böhmer (author)

Many modern reinforcement learning algorithms build on the actor-critic (AC) framework: iterative improvement of a policy (the actor) using policy improvement operators and iterative approximation of the policy's value (the critic). In contrast, the popular value-based algorithm ...

Diverse Projection Ensembles for Distributional Reinforcement Learning

Conference paper (2024) - M.A. Zanger (author) , Wendelin Böhmer (author) , Matthijs T. J. Spaan (author)

In contrast to classical reinforcement learning, distributional RL algorithms aim to learn the distribution of returns rather than their expected value. Since the nature of the return distribution is generally unknown a priori or arbitrarily complex, a common approach finds appro ...

Distributed multi-target tracking and active perception with mobile camera networks

Journal article (2024) - Sara Casao (author) , A. Serra Gomez (author) , Ana C. Murillo (author) , J.W. Böhmer (author) , Javier Alonso-Mora (author) , Eduardo Montijano (author)

Smart cameras are an essential component in surveillance and monitoring applications, and they have been typically deployed in networks of fixed camera locations. The addition of mobile cameras, mounted on robots, can overcome some of the limitations of static networks such as bl ...

Multi-Robot Local Motion Planning Using Dynamic Optimization Fabrics

Conference paper (2024) - Saray Bakker (author) , L. Knödler (author) , M. Spahn (author) , J.W. Böhmer (author) , Javier Alonso-Mora (author)

In this paper, we address the problem of real-time motion planning for multiple robotic manipulators that operate in close proximity. We build upon the concept of dynamic fabrics and extend them to multi-robot systems, referred to as Multi-Robot Dynamic Fabrics (MRDF). This geome ...

Learning scalable and efficient communication policies for multi-robot collision avoidance

Journal article (2023) - A. Serra Gomez (author) , Hai Zhu (author) , B.F. Ferreira de Brito (author) , Wendelin Böhmer (author) , Javier Alonso-Mora (author)

Decentralized multi-robot systems typically perform coordinated motion planning by constantly broadcasting their intentions to avoid collisions. However, the risk of collision between robots varies as they move and communication may not always be needed. This paper presents an ef ...

Active Classification of Moving Targets With Learned Control Policies

Journal article (2023) - A. Serra Gomez (author) , Eduardo Montijano (author) , J.W. Böhmer (author) , J. Alonso-Mora (author)

In this paper, we consider the problem where a drone has to collect semantic information to classify multiple moving targets. In particular, we address the challenge of computing control inputs that move the drone to informative viewpoints, position and orientation, when the info ...

The Role of Diverse Replay for Generalisation in Reinforcement Learning

Preprint (2023) - M.R. Weltevrede (author) , Matthijs T. J. Spaan (author) , J.W. Böhmer (author)

In reinforcement learning (RL), key components of many algorithms are the exploration strategy and replay buffer. These strategies regulate what environment data is collected and trained on and have been extensively studied in the RL literature. In this paper, we investigate the ...

Diverse Projection Ensembles for Distributional Reinforcement Learning

Conference paper (2023) - M.A. Zanger (author) , Wendelin Böhmer (author) , Matthijs T. J. Spaan (author)

In contrast to classical reinforcement learning, distributional reinforcement learning algorithms aim to learn the distribution of returns rather than their expected value. Since the nature of the return distribution is generally unknown a priori or arbitrarily complex, a common ...

E-MCTS: Deep Exploration in Model-Based Reinforcement Learning by Planning with Epistemic Uncertainty

Preprint (2023) - Y. Oren (author) , Matthijs T. J. Spaan (author) , Wendelin Böhmer (author)

One of the most well-studied and highly performing planning approaches used in Model-Based Reinforcement Learning (MBRL) is Monte-Carlo Tree Search (MCTS). Key challenges of MCTS-based MBRL methods remain dedicated deep exploration and reliability in the face of the unknown, and ...

Surrogate DC Microgrid Models for Optimization of Charging Electric Vehicles under Partial Observability

Journal article (2022) - G. Veviurko (author) , J.W. Böhmer (author) , Laurens Mackay (author) , Mathijs M. de de Weerdt (author)

Many electric vehicles (EVs) are using today’s distribution grids, and their flexibility can be highly beneficial for the grid operators. This flexibility can be best exploited by DC power networks, as they allow charging and discharging without extra power electronics and transf ...

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

Conference paper (2021) - Shariq Iqbal (author) , Christian A. Schroeder de Witt (author) , Bei Peng (author) , Wendelin Böhmer (author) , Shimon Whiteson (author) , Fei Sha (author)

Real world multi-agent tasks often involve varying types and quantities of agents and non-agent entities; however, agents within these tasks rarely need to consider all others at all times in order to act effectively. Factored value function approaches have historically leveraged ...

Transient non-stationarity and generalisation in deep reinforcement learning

Conference paper (2021) - Maximilian Igl (author) , Gregory Farquhar (author) , Jelena Luketina (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments. For example, most RL algorithms collect new data throughout training, using a non-stationary behaviour policy. Due to the transience of this non-stationarity, it is often not explicitly add ...

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

Conference paper (2021) - Tarun Gupta (author) , Anuj Mahajan (author) , Bei Peng (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities. While this enables easy decentralization of the learned policy, the restricted joint action value function can pre ...

FACMAC

Factored Multi-Agent Centralised Policy Gradients

Conference paper (2021) - Bei Peng (author) , Tabish Rashid (author) , Christian A. Schroeder de Witt (author) , Pierre-Alexandre Kamienny (author) , Philip H.S. Torr (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. Like MADDPG, a popular multi-agent actor-critic method, our approach uses deep deterministic polic ...

Reinforcement Learning for the Knapsack Problem

Book chapter (2021) - Jacopo Pierotti (author) , Maximilian Kronmueller (author) , Javier Alonso-Mora (author) , J. T. van Essen (author) , J.W. Böhmer (author)

Combinatorial optimization (CO) problems are at the heart of both practical and theoretical research. Due to their complexity, many problems cannot be solved via exact methods in reasonable time; hence, we resort to heuristic solution methods. In recent years, machine learning (M ...

My body is a cage: the role of morphology in graph-based incompatible control

Conference paper (2021) - Vitaly Kurin (author) , Maximilian Igl (author) , Tim Rocktäschel (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

Multitask Reinforcement Learning is a promising way to obtain models with better performance, generalisation, data efficiency, and robustness. Most existing work is limited to compatible settings, where the state and action space dimensions are the same across tasks. Graph Neural ...

Multitask Soft Option Learning

Conference paper (2020) - Maximilian Igl (author) , Andrew Gambardella (author) , J. He (author) , Nantas Nardelli (author) , N Siddharth (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We present Multitask Soft Option Learning (MSOL), a hierarchical multitask framework based on Planning as Inference. MSOL extends the concept of options, using separate variational posteriors for each task, regularized by a shared prior. This “soft” version of options avoids seve ...

Deep residual reinforcement learning

Conference paper (2020) - Shangtong Zhang (author) , J.W. Böhmer (author) , Shimon Whiteson (author)

We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMi ...