Search results | TU Delft Repositories

document

Abstraction-Guided Policy Recovery from Expert Demonstrations

Ponnambalam, C.T. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are...

conference paper 2021

document

Environment Shift Games: Are Multiple Agents the Solution, and not the Problem, to Non-Stationarity?

Mey, A. (author), Oliehoek, F.A. (author)

Machine learning and artificial intelligence models that interact with and in an environment will unavoidably have impact on this environment and change it. This is often a problem as many methods do not anticipate such a change in the environment and thus may start acting sub-optimally. Although efforts are made to deal with this problem, we...

conference paper 2021

document

PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Smit, Jordi (author), Ponnambalam, C.T. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na...

conference paper 2021

document

Loss Bounds for Approximate Influence-Based Abstraction

Congeduti, E. (author), Mey, A. (author), Oliehoek, F.A. (author)

Sequential decision making techniques hold great promise to improve the performance of many real-world systems, but computational complexity hampers their principled application. Influencebased abstraction aims to gain leverage by modeling local subproblems together with the ‘influence’ that the rest of the system exerts on them. While computing...

conference paper 2021

document

Analysing factorizations of action-value networks for cooperative multi-agent reinforcement learning

Castellini, Jacopo (author), Oliehoek, F.A. (author), Savani, Rahul (author), Whiteson, Shimon (author)

Recent years have seen the application of deep reinforcement learning techniques to cooperative multi-agent systems, with great empirical success. However, given the lack of theoretical insight, it remains unclear what the employed neural networks are learning, or how we should enhance their learning power to address the problems on which...

journal article 2021

document

General-Sum Multi-Agent Continuous Inverse Optimal Control

Muench, C. (author), Oliehoek, F.A. (author), Gavrila, D. (author)

Modeling possible future outcomes of robot-human interactions is of importance in the intelligent vehicle and mobile robotics domains. Knowing the reward function that explains the observed behavior of a human agent is advantageous for modeling the behavior with Markov Decision Processes (MDPs). However, learning the rewards that determine...

journal article 2021

document

A sufficient statistic for influence in structured multiagent environments

Oliehoek, F.A. (author), Witwicki, Stefan (author), Kaelbling, Leslie P. (author)

Making decisions in complex environments is a key challenge in artificial intelligence (AI). Situations involving multiple decision makers are particularly complex, leading to computational intractability of principled solution methods. A body of work in AI has tried to mitigate this problem by trying to distill interaction to its essence:...

journal article 2021

document

ReproducedPapers.org: Openly Teaching and Structuring Machine Learning Reproducibility

Yildiz, B. (author), Hung, H.S. (author), Krijthe, J.H. (author), Liem, C.C.S. (author), Loog, M. (author), Migut, M.A. (author), Oliehoek, F.A. (author), Panichella, A. (author), Pawełczak, Przemysław (author), Picek, S. (author), de Weerdt, M.M. (author), van Gemert, J.C. (author)

We present ReproducedPapers.org : an open online repository for teaching and structuring machine learning reproducibility. We evaluate doing a reproduction project among students and the added value of an online reproduction repository among AI researchers. We use anonymous self-assessment surveys and obtained 144 responses. Results suggest...

conference paper 2021

document

Exploring the Effects of Conditioning Independent Q-Learners on the Sufficient Statistic for Dec-POMDPs

Mandersloot, A.V. (author), Oliehoek, F.A. (author), Czechowski, A.T. (author)

In this study, we investigate the effects of conditioning Independent Q-Learners (IQL) not solely on the individual action-observation history, but additionally on the sufficient plan-time statistic for Decentralized Partially Observable Markov Decision Processes. In doing so, we attempt to address a key shortcoming of IQL, namely that it is...

conference paper 2020

document

Maximizing Information Gain in Partially Observable Environments via Prediction Rewards

Satsangi, Yash (author), Lim, Sungsu (author), Whiteson, Shimon (author), Oliehoek, F.A. (author), White, Martha (author)

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a...

conference paper 2020

document

Plannable Approximations to MDP Homomorphisms: Equivariance under Actions

van der Pol, Elise (author), Kipf, Thomas (author), Oliehoek, F.A. (author), Welling, Max (author)

This work exploits action equivariance for representation learning in reinforcement learning. Equivariance under actions states that transitions in the input space are mirrored by equivalent transitions in latent space, while the map and transition functions should also commute. We introduce a contrastive loss function that enforces action...

conference paper 2020

document

Learning What to Attend to: Using Bisimulation Metrics to Explore and Improve Upon What a Deep Reinforcement Learning Agent Learns

Albers, N. (author), Suau, M. (author), Oliehoek, F.A. (author)

Recent years have seen a surge of algorithms and architectures for deep Re-<br/>inforcement Learning (RL), many of which have shown remarkable success for<br/>various problems. Yet, little work has attempted to relate the performance of<br/>these algorithms and architectures to what the resulting deep RL agents actu-<br/>ally learn, and whether...

abstract 2020

document

Influence-Augmented Online Planning for Complex Environments

He, J. (author), Suau, M. (author), Oliehoek, F.A. (author)

How can we plan efficiently in real time to control an agent in a complex environment that may involve many other agents? While existing sample-based planners have enjoyed empirical success in large POMDPs, their performance heavily relies on a fast simulator. However, real-world scenarios are complex in nature and their simulators are often...

journal article 2020

document

Abstraction-Guided Policy Recovery from Expert Demonstrations

Ponnambalam, C.T. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

The goal in behavior cloning is to extract meaningful information from expertdemonstrations and reproduce the same behavior autonomously. However, theavailable data is unlikely to exhaustively cover the potential problem space. As aresult, the quality of automated decision making is compromised without elegantways to handle the encountering of...

conference paper 2020

document

Analog Circuit Design with Dyna-Style Reinforcement Learning

Lee, W. (author), Oliehoek, F.A. (author)

conference paper 2020

document

Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

Oren, Y. (author), Starre, R.A.N. (author), Oliehoek, F.A. (author)

Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of...

conference paper 2020

document

Alternating Maximization with Behavioral Cloning

Czechowski, A.T. (author), Oliehoek, F.A. (author)

The key difficulty of cooperative, decentralized planning lies in making accurate predictions about the behavior of one’s teammates. In this paper we introduce a planning method of Alternating maximization with Behavioural Cloning (ABC) – a trainable online decentralized planning algorithm based on Monte Carlo Tree Search (MCTS), combined with...

conference paper 2020

document

Sensor Data for Human Activity Recognition: Feature Representation and Benchmarking

Alves, Flavia (author), Gairing, Martin (author), Oliehoek, F.A. (author), Do, Thanh-Toan (author)

The field of Human Activity Recognition (HAR) focuses on obtaining and analysing data captured from monitoring devices (e.g. sensors). There is a wide range of applications within the field; for instance, assisted living, security surveillance, and intelligent transportation. In HAR, the development of Activity Recognition models is dependent...

conference paper 2020

document

Decentralized MCTS via Learned Teammate Models

Czechowski, A.T. (author), Oliehoek, F.A. (author)

Decentralized online planning can be an attractive paradigm for cooperative multi-agent systems, due to improved scalability and robustness. A key difficulty of such approach lies in making accurate predictions about the decisions of other agents. In this paper, we present a trainable online decentralized planning algorithm based on...

conference paper 2020

document

A Research Agenda for Hybrid Intelligence: Augmenting Human Intellect With Collaborative, Adaptive, Responsible, and Explainable Artificial Intelligence

Akata, Zeynep (author), Dignum, M.V. (author), Hindriks, K.V. (author), Hung, H.S. (author), Jonker, C.M. (author), Neerincx, M.A. (author), Oliehoek, F.A. (author), van Riemsdijk, M.B. (author), Robbins-van Wynsberghe, A.L. (author)

We define hybrid intelligence (HI) as the combination of human and machine intelligence, augmenting human intellect and capabilities instead of replacing them and achieving goals that were unreachable by either humans or machines. HI is an important new research focus for artificial intelligence, and we set a research agenda for HI by...

journal article 2020

Pages

Pages