Searched for: +
(1 - 20 of 82)

Pages

document
Yang, Q. (author), Spaan, M.T.J. (author)
Without an assigned task, a suitable intrinsic objective for an agent is to explore the environment efficiently. However, the pursuit of exploration will inevitably bring more safety risks.<br/>An under-explored aspect of reinforcement learning is how to achieve safe efficient exploration when the task is unknown.<br/>In this paper, we propose a...
conference paper 2023
document
Castellini, Alberto (author), Bianchi, Federico (author), Zorzi, Edoardo (author), Simão, Thiago D. (author), Farinelli, Alessandro (author), Spaan, M.T.J. (author)
Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB...
journal article 2023
document
Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safety is critical to broadening the application of reinforcement learning (RL). Often, we train RL agents in a controlled environment, such as a laboratory, before deploying them in the real world. However, the real-world target task might be unknown prior to deployment. Reward-free RL trains an agent without the reward to adapt quickly once...
conference paper 2023
document
Ponnambalam, C.T. (author), Kamran, Danial (author), Simão, T. D. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)
conference paper 2022
document
Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...
conference paper 2022
document
Suau, M. (author), He, J. (author), Çelikok, Mustafa Mert (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)
Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we show how to factorize large networked systems of many agents into...
conference paper 2022
document
Los, J. (author), Schulte, F. (author), Gansterer, Margaretha (author), Hartl, Richard F. (author), Spaan, M.T.J. (author), Negenborn, R.R. (author)
Carriers can remarkably reduce transportation costs and emissions when they collaborate, for example through a platform. Such gains, however, have only been investigated for relatively small problem instances with low numbers of carriers. We develop auction-based methods for large-scale dynamic collaborative pickup and delivery problems,...
journal article 2022
document
Suau, M. (author), He, J. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)
Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simulators of complicated systems that can run sufficiently fast for...
conference paper 2022
document
Yang, Q. (author), Simão, T. D. (author), Jansen, Nils (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safety is critical to broadening the a lication of reinforcement learning (RL). Often, RL agents are trained in a controlled environment, such as a laboratory, before being de loyed in the real world. However, the target reward might be unknown rior to de loyment. Reward-free RL addresses this roblem by training an agent without the reward to...
conference paper 2022
document
Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safety is critical to broadening the real-world use of reinforcement learning (RL). Modeling the safety aspects using a safety-cost signal separate from the reward is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, the total safety-cost distribution of different...
conference paper 2022
document
Los, J. (author), Schulte, F. (author), Spaan, M.T.J. (author), Negenborn, R.R. (author)
Collaboration in transportation is important to reduce costs and emissions, but carriers may have incentives to bid strategically in decentralized auction systems. We investigate what the effect of the auction strategy is on the possible cheating benefits in a dynamic context, such that we can recommend a method with lower chances for...
conference paper 2022
document
Junges, Sebastian (author), Spaan, M.T.J. (author)
Markov decision processes are a ubiquitous formalism for modelling systems with non-deterministic and probabilistic behavior. Verification of these models is subject to the famous state space explosion problem. We alleviate this problem by exploiting a hierarchical structure with repetitive parts. This structure not only occurs naturally in...
conference paper 2022
document
Los, J. (author), Schulte, F. (author), Spaan, M.T.J. (author), Negenborn, R.R. (author)
The trends of autonomous transportation and mobility on demand in line with large numbers of requests increasingly call for decentralized vehicle routing optimization. Multi-agent systems (MASs) allow to model fully autonomous decentralized decision making, but are rarely considered in current decision support approaches. We propose a multi...
conference paper 2022
document
Kamran, Danial (author), Simão, T. D. (author), Yang, Q. (author), Ponnambalam, C.T. (author), Fischer, Johannes (author), Spaan, M.T.J. (author), Lauer, Martin (author)
The use of reinforcement learning (RL) in real-world domains often requires extensive effort to ensure safe behavior. While this compromises the autonomy of the system, it might still be too risky to allow a learning agent to freely explore its environment. These strict impositions come at the cost of flexibility and applying them often relies...
conference paper 2022
document
Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safety is critical to broadening the real-world use of reinforcement learning. Modeling the safety aspects using a safety-cost signal separate from the reward and bounding the expected safety-cost is becoming standard practice, since it avoids the problem of finding a good balance between safety and performance. However, it can be risky to set...
journal article 2022
document
Carr, Steven (author), Jansen, Nils (author), Bharadwaj, Suda (author), Spaan, M.T.J. (author), Topcu, Ufuk (author)
We study planning problems where a controllable agent operates under partial observability and interacts with an uncontrollable opponent, also referred to as the adversary. The agent has two distinct objectives: To maximize an expected<br/>value and to adhere to a safety specification. Multi-objective partially observable stochastic games (POSGs...
conference paper 2021
document
Ponnambalam, C.T. (author), Oliehoek, F.A. (author), Spaan, M.T.J. (author)
Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are...
conference paper 2021
document
Smit, Jordi (author), Ponnambalam, C.T. (author), Spaan, M.T.J. (author), Oliehoek, F.A. (author)
Offline reinforcement learning (RL), or learning from a fixed data set, is an attractive alternative to online RL. Offline RL promises to address the cost and safety implications of tak- ing numerous random or bad actions online, a crucial aspect of traditional RL that makes it difficult to apply in real-world problems. However, when RL is na...
conference paper 2021
document
Simão, T. D. (author), Jansen, Nils (author), Spaan, M.T.J. (author)
Deploying reinforcement learning (RL) involves major concerns around safety. Engineering a reward signal that allows the agent to maximize its performance while remaining safe is not trivial. Safe RL studies how to mitigate such problems. For instance, we can decouple safety from reward using constrained Markov decision processes (CMDPs), where...
conference paper 2021
document
Yang, Q. (author), Simão, T. D. (author), Tindemans, Simon H. (author), Spaan, M.T.J. (author)
Safe exploration is regarded as a key priority area for reinforcement learning research. With separate reward and safety signals, it is natural to cast it as constrained reinforcement learning, where expected long-term costs of policies are constrained. However, it can be hazardous to set constraints on the expected safety signal without...
conference paper 2021
Searched for: +
(1 - 20 of 82)

Pages