Search results | TU Delft Repositories

document

Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Tang, Shi Yuan (author), Irissappane, Athirai A. (author), Oliehoek, F.A. (author), Zhang, Jie (author)

Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper...

journal article 2023

document

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Peschl, M. (author), Zgonnikov, A. (author), Oliehoek, F.A. (author), Cavalcante Siebert, L. (author)

Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We...

conference paper 2022

document

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

conference paper 2022

document

Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...

conference paper 2022

document

A Cross-Field Review of State Abstraction for Markov Decision Processes

Congeduti, E. (author), Oliehoek, F.A. (author)

Complex real-world systems pose a significant challenge to decision making: an agent needs to explore a large environment, deal with incomplete or noisy information, generalize the experience and learn from feedback to act optimally. These processes demand vast representation capacity, thus putting a burden on the agent’s limited computational...

conference paper 2022

document

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Suau, M. (author), He, J. (author), Congeduti, E. (author), Starre, R.A.N. (author), Czechowski, A.T. (author), Oliehoek, F.A. (author)

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations....

journal article 2022

document

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Celikok, M.M. (author), Oliehoek, F.A. (author), Kaski, Samuel (author)

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled...

conference paper 2022

document

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...

conference paper 2022

document

Difference rewards policy gradients

Castellini, Jacopo (author), Devlin, Sam (author), Oliehoek, F.A. (author), Savani, Rahul (author)

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We...

journal article 2022

document

Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Tang, Shi Yuan (author), Oliehoek, F.A. (author), Irissappane, Athirai A. (author), Zhang, Jie (author)

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyperparameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural...

conference paper 2021

document

Difference Rewards Policy Gradients

Castellini, Jacopo (author), Oliehoek, F.A. (author), Devlin, Sam (author), Savani, Rahul (author)

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose...

conference paper 2021

document

Using Bisimulation Metrics to Analyze and Evaluate Latent State Representations

Albers, N. (author), Suau, M. (author), Oliehoek, F.A. (author)

Deep Reinforcement Learning (RL) is a promising technique towards constructing intelligent agents, but it is not always easy to understand the learning process and the factors that impact it. To shed some light on this, we analyze the Latent State Representations (LSRs) that deep RL agents learn, and compare them to what such agents should...

conference paper 2021

document

General-Sum Multi-Agent Continuous Inverse Optimal Control

Muench, C. (author), Oliehoek, F.A. (author), Gavrila, D. (author)

Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine...

journal article 2021

document

Exploring the Effects of Conditioning Independent Q-Learners on the Sufficient Statistic for Dec-POMDPs

Mandersloot, A.V. (author), Oliehoek, F.A. (author), Czechowski, A.T. (author)

In this study, we investigate the effects of conditioning Independent Q-Learners (IQL) not solely on the individual action-observation history, but additionally on the sufficient plan-time statistic for Decentralized Partially Observable Markov Decision Processes. In doing so, we attempt to address a key shortcoming of IQL, namely that it is...

conference paper 2020

document

Maximizing Information Gain in Partially Observable Environments via Prediction Rewards

Satsangi, Yash (author), Lim, Sungsu (author), Whiteson, Shimon (author), Oliehoek, F.A. (author), White, Martha (author)

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a...

conference paper 2020

document

Learning What to Attend to: Using Bisimulation Metrics to Explore and Improve Upon What a Deep Reinforcement Learning Agent Learns

Albers, N. (author), Suau, M. (author), Oliehoek, F.A. (author)

Recent years have seen a surge of algorithms and architectures for deep Re-<br/>inforcement Learning (RL), many of which have shown remarkable success for<br/>various problems. Yet, little work has attempted to relate the performance of<br/>these algorithms and architectures to what the resulting deep RL agents actu-<br/>ally learn, and whether...

abstract 2020

document

Influence-Based Abstraction in Deep Reinforcement Learning

Suau, M. (author), Congeduti, E. (author), Starre, R.A.N. (author), Czechowski, A.T. (author), Oliehoek, F.A. (author)

thousands, or even millions of state variables. Unfortunately, applying reinforcement learning algorithms to handle complex tasks becomes more and more challenging as the number of state variables increases. In this paper, we build on the concept of influence-based abstraction which tries to tackle such scalability issues by decomposing large...

conference paper 2019

document

Bayesian Reinforcement Learning in Factored POMDPs

Katt, Sammie (author), Oliehoek, F.A. (author), Amato, Christopher (author)

Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable in partially observable domains, such as the Bayes-Adaptive POMDP (BA-POMDP), scale poorly. To...

conference paper 2019

document

Interactive Learning and Decision Making: Foundations, Insights & Challenges

Oliehoek, F.A. (author)

Designing "teams of intelligent agents that successfully coordinate and learn about their complex environments inhabited by other agents (such as humans)" is one of the major goals of AI, and it is the challenge that I aim to address in my research. In this paper I give an overview of some of the foundations, insights and challenges in this...

conference paper 2018

document

The MADP Toolbox: An Open Source Library for Planning and Learning in (Multi-)Agent Systems

Oliehoek, F.A. (author), Spaan, M.T.J. (author), Terwijn, Bas (author), Robbel, Philipp (author), Messias, João V. (author)

This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent...

journal article 2017