Search results | TU Delft Repositories

Searched for: +

(1 - 4 of 4)

document: Reinforcement Learning by Guided Safe Exploration
Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once...
conference paper 2023

document: Training and Transferring Safe Policies in Reinforcement Learning
Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to...
conference paper 2022

document: Safe Policies for Factored Partially Observable Stochastic Games
Carr, Steven (author), Jansen, Nils (author), Bharadwaj, Suda (author), Spaan, M.T.J. (author), Topcu, Ufuk (author)
We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expected<br/>value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs...
conference paper 2021

document: AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training
Simão, T. D. (author), Jansen, Nils (author), Spaan, M.T.J. (author)
Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where...
conference paper 2021

Searched for: +

(1 - 4 of 4)