Search results | TU Delft Repositories

document

Teacher-apprentices RL (TARL): leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Tang, Shi Yuan (author), Irissappane, Athirai A. (author), Oliehoek, F.A. (author), Zhang, Jie (author)

Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper...

journal article 2023

document

Safety Guarantees in Multi-agent Learning via Trapping Regions

Czechowski, A.T. (author), Oliehoek, F.A. (author)

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier...

journal article 2023

document

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

MS Osika, Z. (author), Zatarain Salazar, J. (author), Roijers, Diederik M. (author), Oliehoek, F.A. (author), Murukannaiah, P.K. (author)

We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fields. We provide an overview of the advances on this topic,...

conference paper 2023

document

Safe Multi-agent Learning via Trapping Regions

Czechowski, A.T. (author), Oliehoek, F.A. (author)

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to most single-agent environments, and sets a prohibitive barrier...

conference paper 2023

document

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Peschl, M. (author), Zgonnikov, A. (author), Oliehoek, F.A. (author), Cavalcante Siebert, L. (author)

Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We...

conference paper 2022

document

Online Planning in POMDPs with Self-Improving Simulators

He, J. (author), Suau, M. (author), Baier, Hendrik (author), Kaisers, Michael (author), Oliehoek, F.A. (author)

How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over time. To plan reliably and efficiently while the approximate simulator...

conference paper 2022

document

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

conference paper 2022

document

Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...

conference paper 2022

document

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Suau, M. (author), He, J. (author), Çelikok, Mustafa Mert (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to factorize large networked systems of many agents into...

conference paper 2022

document

A Cross-Field Review of State Abstraction for Markov Decision Processes

Congeduti, E. (author), Oliehoek, F.A. (author)

Complex real-world systems pose a significant challenge to decision making: an agent needs to explore a large environment, deal with incomplete or noisy information, generalize the experience and learn from feedback to act optimally. These processes demand vast representation capacity, thus putting a burden on the agent’s limited computational...

conference paper 2022

document

BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Katt, Sammie (author), Nguyen, Hai (author), Oliehoek, F.A. (author), Amato, Christopher (author)

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with...

conference paper 2022

document

Multi Robot Surveillance and Planning in Limited Communication Environments

Inna Kedege, V. (author), Czechowski, A.T. (author), Stellingwerff, Ludo (author), Oliehoek, F.A. (author)

Distributed robots that survey and assist with search & rescue operations usually deal with unknown environments with limited communication. This paper focuses on distributed & cooperative multi-robot area coverage strategies of unknown environments, having constrained communication. Due to restricted communication there is...

conference paper 2022

document

Difference rewards policy gradients

Castellini, Jacopo (author), Devlin, Sam (author), Oliehoek, F.A. (author), Savani, Rahul (author)

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We...

journal article 2022

document

Model-Based Reinforcement Learning with State Abstraction: A Survey

Starre, R.A.N. (author), Loog, M. (author), Oliehoek, F.A. (author)

Model-based reinforcement learning methods are promising since they can increase sample efficiency while simultaneously improving generalizability. Learning can also be made more efficient through state abstraction, which delivers more compact models. Model-based reinforcement learning methods have been combined with learning abstract models to...

conference paper 2022

document

Influence-aware memory architectures for deep reinforcement learning in POMDPs

Suau, M. (author), He, J. (author), Congeduti, E. (author), Starre, R.A.N. (author), Czechowski, A.T. (author), Oliehoek, F.A. (author)

Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations....

journal article 2022

document

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Celikok, M.M. (author), Oliehoek, F.A. (author), Kaski, Samuel (author)

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled...

conference paper 2022

document

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...

conference paper 2022

document

Learning Complex Policy Distribution with CEM Guided Adversarial Hypernetwork

Tang, Shi Yuan (author), Oliehoek, F.A. (author), Irissappane, Athirai A. (author), Zhang, Jie (author)

Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyperparameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural...

conference paper 2021

document

Difference Rewards Policy Gradients

Castellini, Jacopo (author), Oliehoek, F.A. (author), Devlin, Sam (author), Savani, Rahul (author)

Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose...

conference paper 2021

document

Using Bisimulation Metrics to Analyze and Evaluate Latent State Representations

Albers, N. (author), Suau, M. (author), Oliehoek, F.A. (author)

Deep Reinforcement Learning (RL) is a promising technique towards constructing intelligent agents, but it is not always easy to understand the learning process and the factors that impact it. To shed some light on this, we analyze the Latent State Representations (LSRs) that deep RL agents learn, and compare them to what such agents should...

conference paper 2021

Pages

Pages