Searched for: subject%3A%22reinforcement%255C+learning%22
(1 - 20 of 20)
document
Tang, Shi Yuan (author), Irissappane, Athirai A. (author), Oliehoek, F.A. (author), Zhang, Jie (author)
Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across different runs, especially when the algorithm is sensitive to hyper...
journal article 2023
document
Peschl, M. (author), Zgonnikov, A. (author), Oliehoek, F.A. (author), Cavalcante Siebert, L. (author)
Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficult to trade off different reward functions from multiple experts. We...
conference paper 2022
document
Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)
conference paper 2022
document
Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...
conference paper 2022
document
Congeduti, E. (author), Oliehoek, F.A. (author)
Complex real-world systems pose a significant challenge to decision making: an agent needs to explore a large environment, deal with incomplete or noisy information, generalize the experience and learn from feedback to act optimally. These processes demand vast representation capacity, thus putting a burden on the agent’s limited computational...
conference paper 2022
document
Suau, M. (author), He, J. (author), Congeduti, E. (author), Starre, R.A.N. (author), Czechowski, A.T. (author), Oliehoek, F.A. (author)
Due to its perceptual limitations, an agent may have too little information about the environment to act optimally. In such cases, it is important to keep track of the action-observation history to uncover hidden state information. Recent deep reinforcement learning methods use recurrent neural networks (RNN) to memorize past observations....
journal article 2022
document
Celikok, M.M. (author), Oliehoek, F.A. (author), Kaski, Samuel (author)
Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human and the AI as a sequential game where the agents are modelled...
conference paper 2022
document
Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...
conference paper 2022
document
Castellini, Jacopo (author), Devlin, Sam (author), Oliehoek, F.A. (author), Savani, Rahul (author)
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We...
journal article 2022
document
Tang, Shi Yuan (author), Oliehoek, F.A. (author), Irissappane, Athirai A. (author), Zhang, Jie (author)
Cross-Entropy Method (CEM) is a gradient-free direct policy search method, which has greater stability and is insensitive to hyperparameter tuning. CEM bears similarity to population-based evolutionary methods, but, rather than using a population it uses a distribution over candidate solutions (policies in our case). Usually, a natural...
conference paper 2021
document
Castellini, Jacopo (author), Oliehoek, F.A. (author), Devlin, Sam (author), Savani, Rahul (author)
Policy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose...
conference paper 2021
document
Albers, N. (author), Suau, M. (author), Oliehoek, F.A. (author)
Deep Reinforcement Learning (RL) is a promising technique towards constructing intelligent agents, but it is not always easy to understand the learning process and the factors that impact it. To shed some light on this, we analyze the Latent State Representations (LSRs) that deep RL agents learn, and compare them to what such agents should...
conference paper 2021
document
Muench, C. (author), Oliehoek, F.A. (author), Gavrila, D. (author)
Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine...
journal article 2021
document
Mandersloot, A.V. (author), Oliehoek, F.A. (author), Czechowski, A.T. (author)
In this study, we investigate the effects of conditioning Independent Q-Learners (IQL) not solely on the individual action-observation history, but additionally on the sufficient plan-time statistic for Decentralized Partially Observable Markov Decision Processes. In doing so, we attempt to address a key shortcoming of IQL, namely that it is...
conference paper 2020
document
Satsangi, Yash (author), Lim, Sungsu (author), Whiteson, Shimon (author), Oliehoek, F.A. (author), White, Martha (author)
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a...
conference paper 2020
document
Albers, N. (author), Suau, M. (author), Oliehoek, F.A. (author)
Recent years have seen a surge of algorithms and architectures for deep Re-<br/>inforcement Learning (RL), many of which have shown remarkable success for<br/>various problems. Yet, little work has attempted to relate the performance of<br/>these algorithms and architectures to what the resulting deep RL agents actu-<br/>ally learn, and whether...
abstract 2020
document
Suau, M. (author), Congeduti, E. (author), Starre, R.A.N. (author), Czechowski, A.T. (author), Oliehoek, F.A. (author)
thousands, or even millions of state variables. Unfortunately, applying reinforcement learning algorithms to handle complex tasks becomes more and more challenging as the number of state variables increases. In this paper, we build on the concept of influence-based abstraction which tries to tackle such scalability issues by decomposing large...
conference paper 2019
document
Katt, Sammie (author), Oliehoek, F.A. (author), Amato, Christopher (author)
Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable in partially observable domains, such as the Bayes-Adaptive POMDP (BA-POMDP), scale poorly. To...
conference paper 2019
document
Oliehoek, F.A. (author)
Designing "teams of intelligent agents that successfully coordinate and learn about their complex environments inhabited by other agents (such as humans)" is one of the major goals of AI, and it is the challenge that I aim to address in my research. In this paper I give an overview of some of the foundations, insights and challenges in this...
conference paper 2018
document
Oliehoek, F.A. (author), Spaan, M.T.J. (author), Terwijn, Bas (author), Robbel, Philipp (author), Messias, João V. (author)
This article describes the MultiAgent Decision Process (MADP) toolbox, a software library to support planning and learning for intelligent agents and multiagent systems in uncertain environments. Key features are that it supports partially observable environments and stochastic transition models; has unified support for single- and multiagent...
journal article 2017
Searched for: subject%3A%22reinforcement%255C+learning%22
(1 - 20 of 20)