Search results | TU Delft Repositories

document

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

conference paper 2022

document

A Modern Perspective on Safe Automated Driving for Different Traffic Dynamics using Constrained Reinforcement Learning

Kamran, Danial (author), Simão, T. D. (author), Yang, Q. (author), Ponnambalam, C.T. (author), Fischer, Johannes (author), Spaan, M.T.J. (author), Lauer, Martin (author)

The use of reinforcement learning (RL) in real-world domains often requires extensive effort to ensure safe behavior. While this compromises the autonomy of the system, it might still be too risky to allow a learning agent to freely explore its environment. These strict impositions come at the cost of flexibility and applying them often relies...

conference paper 2022

document

Abstraction-Guided Policy Recovery from Expert Demonstrations

Ponnambalam, C.T. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are...

conference paper 2021

document

PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Smit, Jordi (author), Ponnambalam, C.T. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na...

conference paper 2021

document

Interval Q-Learning: Balancing Deep and Wide Exploration

Neustroev, G. (author), Ponnambalam, C.T. (author), de Weerdt, M.M. (author), Spaan, M.T.J. (author)

Reinforcement learning requires exploration, leading to repeated execution of sub-optimal actions. Naive exploration techniques address this problem by changing gradually from exploration to exploitation. This approach employs a wide search resulting in exhaustive exploration and low sample-efficiency. More advanced search methods explore...

conference paper 2020

document

Abstraction-Guided Policy Recovery from Expert Demonstrations

Ponnambalam, C.T. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

The goal in behavior cloning is to extract meaningful information from expertdemonstrations and reproduce the same behavior autonomously. However, theavailable data is unlikely to exhaustively cover the potential problem space. As aresult, the quality of automated decision making is compromised without elegantways to handle the encountering of...

conference paper 2020