Search results | TU Delft Repositories

document

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

conference paper 2022

document

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

Simão, T. D. (author), Jansen, Nils (author), Spaan, M.T.J. (author)

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where...

conference paper 2021

document

Training and Transferring Safe Policies in Reinforcement Learning

Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to...

conference paper 2022

document

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without...

conference paper 2021

document

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different...

conference paper 2022

document

Reinforcement Learning by Guided Safe Exploration

Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once...

conference paper 2023

document

A Modern Perspective on Safe Automated Driving for Different Traffic Dynamics using Constrained Reinforcement Learning

Kamran, Danial (author), Simão, T. D. (author), Yang, Q. (author), Ponnambalam, C.T. (author), Fischer, Johannes (author), Spaan, M.T.J. (author), Lauer, Martin (author)

The use of reinforcement learning (RL) in real-world domains often requires extensive effort to ensure safe behavior. While this compromises the autonomy of the system, it might still be too risky to allow a learning agent to freely explore its environment. These strict impositions come at the cost of flexibility and applying them often relies...

conference paper 2022

document

Safety-constrained reinforcement learning with a distributional safety critic

Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the real-world use of reinforcement learning. Modeling the safety aspects using a safety-cost signal separate from the reward and bounding the expected safety-cost is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, it can be risky to set...

journal article 2022

document

Scalable Safe Policy Improvement via Monte Carlo Tree Search

Castellini, Alberto (author), Bianchi, Federico (author), Zorzi, Edoardo (author), Simão, Thiago D. (author), Farinelli, Alessandro (author), Spaan, M.T.J. (author)

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB...

journal article 2023