Search results | TU Delft Repositories

document

CEM: Constrained Entropy Maximization for Task-Agnostic Safe Exploration

Yang, Q. (author), Spaan, M.T.J. (author)

Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.<br/>An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.<br/>In this paper, we propose a...

conference paper 2023

document

Scalable Safe Policy Improvement via Monte Carlo Tree Search

Castellini, Alberto (author), Bianchi, Federico (author), Zorzi, Edoardo (author), Simão, Thiago D. (author), Farinelli, Alessandro (author), Spaan, M.T.J. (author)

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB...

journal article 2023

document

Reinforcement Learning by Guided Safe Exploration

Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once...

conference paper 2023

document

Back to the Future: Solving Hidden Parameter MDPs with Hindsight

Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

conference paper 2022

document

Influence-Augmented Local Simulators: a Scalable Solution for Fast Deep RL in Large Networked Systems

Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...

conference paper 2022

document

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Suau, M. (author), He, J. (author), Çelikok, Mustafa Mert (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to factorize large networked systems of many agents into...

conference paper 2022

document

Large-scale collaborative vehicle routing

Los, J. (author), Schulte, F. (author), Gansterer, Margaretha (author), Hartl, Richard F. (author), Spaan, M.T.J. (author), Negenborn, R.R. (author)

Carriers can remarkably reduce transportation costs and emissions when they collaborate, for example through a platform. Such gains, however, have only been investigated for relatively small problem instances with low numbers of carriers. We develop auction-based methods for large-scale dynamic collaborative pickup and delivery problems,...

journal article 2022

document

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...

conference paper 2022

document

Training and Transferring Safe Policies in Reinforcement Learning

Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to...

conference paper 2022

document

Refined Risk Management in Safe Reinforcement Learning with a Distributional Safety Critic

Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different...

conference paper 2022

document

Strategic Bidding in Decentralized Collaborative Vehicle Routing

Los, J. (author), Schulte, F. (author), Spaan, M.T.J. (author), Negenborn, R.R. (author)

Collaboration in transportation is important to reduce costs and emissions, but carriers may have incentives to bid strategically in decentralized auction systems. We investigate what the effect of the auction strategy is on the possible cheating benefits in a dynamic context, such that we can recommend a method with lower chances for...

conference paper 2022

document

Abstraction-Refinement for Hierarchical Probabilistic Models

Junges, Sebastian (author), Spaan, M.T.J. (author)

Markov decision processes are a ubiquitous formalism for modelling systems with non-deterministic and probabilistic behavior. Verification of these models is subject to the famous state space explosion problem. We alleviate this problem by exploiting a hierarchical structure with repetitive parts. This structure not only occurs naturally in...

conference paper 2022

document

An Auction-Based Multi-Agent System for the Pickup and Delivery Problem with Autonomous Vehicles and Alternative Locations

Los, J. (author), Schulte, F. (author), Spaan, M.T.J. (author), Negenborn, R.R. (author)

The trends of autonomous transportation and mobility on demand in line with large numbers of requests increasingly call for decentralized vehicle routing optimization. Multi-agent systems (MASs) allow to model fully autonomous decentralized decision making, but are rarely considered in current decision support approaches. We propose a multi...

conference paper 2022

document

A Modern Perspective on Safe Automated Driving for Different Traffic Dynamics using Constrained Reinforcement Learning

Kamran, Danial (author), Simão, T. D. (author), Yang, Q. (author), Ponnambalam, C.T. (author), Fischer, Johannes (author), Spaan, M.T.J. (author), Lauer, Martin (author)

The use of reinforcement learning (RL) in real-world domains often requires extensive effort to ensure safe behavior. While this compromises the autonomy of the system, it might still be too risky to allow a learning agent to freely explore its environment. These strict impositions come at the cost of flexibility and applying them often relies...

conference paper 2022

document

Safety-constrained reinforcement learning with a distributional safety critic

Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safety is critical to broadening the real-world use of reinforcement learning. Modeling the safety aspects using a safety-cost signal separate from the reward and bounding the expected safety-cost is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, it can be risky to set...

journal article 2022

document

Safe Policies for Factored Partially Observable Stochastic Games

Carr, Steven (author), Jansen, Nils (author), Bharadwaj, Suda (author), Spaan, M.T.J. (author), Topcu, Ufuk (author)

We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expected<br/>value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs...

conference paper 2021

document

Abstraction-Guided Policy Recovery from Expert Demonstrations

Ponnambalam, C.T. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)

Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are...

conference paper 2021

document

PEBL: Pessimistic Ensembles for Offline Deep Reinforcement Learning

Smit, Jordi (author), Ponnambalam, C.T. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)

Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na...

conference paper 2021

document

AlwaysSafe: Reinforcement Learning without Safety Constraint Violations during Training

Simão, T. D. (author), Jansen, Nils (author), Spaan, M.T.J. (author)

Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where...

conference paper 2021

document

WCSAC: Worst-Case Soft Actor Critic for Safety-Constrained Reinforcement Learning

Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)

Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without...

conference paper 2021

Pages

Pages