E.M.P. Walraven | TU Delft Repository

Constrained multiagent Markov decision processes

A taxonomy of problems and algorithms

Journal article (2021) - Frits de Nijs, Erwin Walraven, Mathijs M. de Weerdt, Matthijs T.J. Spaan

In domains such as electric vehicle charging, smart distribution grids and autonomous warehouses, multiple agents share the same resources. When planning the use of these resources, agents need to deal with the uncertainty in these domains. Although several models and algorithms for such constrained multiagent planning problems under uncertainty have been proposed in the literature, it remains unclear when which algorithm can be applied. In this survey we conceptualize these domains and establish a generic problem class based on Markov decision processes. We identify and compare the conditions under which algorithms from the planning literature for problems in this class can be applied: whether constraints are soft or hard, whether agents are continuously connected, whether the domain is fully observable, whether a constraint is momentarily (instantaneous) or on a budget, and whether the constraint is on a single resource or on multiple. Further we discuss the advantages and disadvantages of these algorithms. We conclude by identifying open problems that are directly related to the conceptualized domains, as well as in adjacent research areas. ...

Planning under Uncertainty in Constrained and Partially Observable Environments

Doctoral thesis (2019) - Erwin Walraven, Matthijs Spaan, Cees Witteveen

Developing intelligent decision making systems in the real world requires planning algorithms which are able to deal with sources of uncertainty and constraints. An example can be found in smart distribution grids, in which planning can be used to decide when electric vehicles charge their batteries, such that the capacity limits of lines are respected at all times. In this particular example there can be uncertainty in the arrival time and charging demand of vehicles, and constraints follow directly from the capacity limits of the distribution grid to which vehicles are connected. Existing algorithms for planning under uncertainty subject to constraints are currently not suitable for these types of applications, and therefore this dissertation aims improve the applicability of these algorithms by advancing the state of the art in constrained multi-agent planning under uncertainty. The dissertation presents new algorithmic techniques for exact POMDP planning, finite-horizon POMDPs and POMDPs with constraints. Additionally, the dissertation shows how models for constrained planning can be used in smart distribution grids. ...

Point-Based Value Iteration for Finite-Horizon POMDPs

Journal article (2019) - Erwin Walraven, Matthijs T.J. Spaan

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs. ...

Bootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs

Conference paper (2018) - Diederik M. Roijers, Erwin Walraven, Matthijs T.J. Spaan

Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning. ...

Column Generation Algorithms for Constrained POMDPs

Journal article (2018) - Erwin Walraven, Matthijs Spaan

In several real-world domains it is required to plan ahead while there are finite resources available for executing the plan. The limited availability of resources imposes constraints on the plans that can be executed, which need to be taken into account while computing a plan. A Constrained Partially Observable Markov Decision Process (Constrained POMDP) can be used to model resource-constrained planning problems which include uncertainty and partial observability. Constrained POMDPs provide a framework for computing policies which maximize expected reward, while respecting constraints on a secondary objective such as cost or resource consumption. Column generation for linear programming can be used to obtain Constrained POMDP solutions. This method incrementally adds columns to a linear program, in which each column corresponds to a POMDP policy obtained by solving an unconstrained subproblem. Column generation requires solving a potentially large number of POMDPs, as well as exact evaluation of the resulting policies, which is computationally difficult. We propose a method to solve subproblems in a two-stage fashion using approximation algorithms. First, we use a tailored point-based POMDP algorithm to obtain an approximate subproblem solution. Next, we convert this approximate solution into a policy graph, which we can evaluate efficiently. The resulting algorithm is a new approximate method for Constrained POMDPs in single-agent settings, but also in settings in which multiple independent agents share a global constraint. Experiments based on several domains show that our method outperforms the current state of the art. ...

Bounding the probability of resource constraint violations in multi-agent MDPs

Conference paper (2017) - Frits De Nijs, Erwin Walraven, Mathijs M. De Weerdt, Matthijs T.J. Spaan

Multi-agent planning problems with constraints on global resource consumption occur in several domains. Existing algorithms for solving Multi-agent Markov Decision Processes can compute policies that meet a resource constraint in expectation, but these policies provide no guarantees on the probability that a resource constraint violation will occur. We derive a method to bound constraint violation probabilities using Hoeffding's inequality. This method is applied to two existing approaches for computing policies satisfying constraints: the Constrained MDP framework and a Column Generation approach. We also introduce an algorithm to adaptively relax the bound up to a given maximum violation tolerance. Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efficiently trade off violation probability with expected value, outperforming state-of-the-art planners. ...

Accelerated Vector Pruning for Optimal POMDP Solvers

Conference paper (2017) - Erwin Walraven, Matthijs Spaan

Partially Observable Markov Decision Processes (POMDPs) are powerful models for planning under uncertainty in partially observable domains. However, computing optimal solutions for POMDPs is challenging because of the high computational requirements of POMDP solution algorithms. Several algorithms use a subroutine to prune dominated vectors in value functions, which requires a large number of linear programs (LPs) to be solved and it represents a large part of the total running time. In this paper we show how the LPs in POMDP pruning subroutines can be decomposed using a Benders decomposition. The resulting algorithm incrementally adds LP constraints and uses only a small fraction of the constraints. Our algorithm significantly improves the performance of existing pruning methods and the commonly used incremental pruning algorithm. Our new variant of incremental pruning is the fastest optimal pruning-based POMDP algorithm. ...

Planning Under Uncertainty for Aggregated Electric Vehicle Charging with Renewable Energy Supply

Conference paper (2016) - Erwin Walraven, Matthijs T. J. Spaan

Renewable energy sources introduce uncertainty regarding generated power in smart grids. For instance, power that is generated by wind turbines is time-varying and dependent on the weather. Electric vehicles will become increasingly important in the development of smart grids with a high penetration of renewables, because their flexibility makes it possible to charge their batteries when renewable supply is available. Charging of electric vehicles can be challenging, however, because of uncertainty in renewable supply and the potentially large number of vehicles involved. In this paper we propose a vehicle aggregation framework which uses Markov Decision Processes to control electric vehicles and deals with uncertainty in renewable supply. We present a grouping technique to address the scalability aspects of our framework. In experiments we show that the aggregation framework maximizes the profit of the aggregator, reduces cost of customers and reduces consumption of conventionally-generated power. ...

Planning under Uncertainty for Aggregated Electric Vehicle Charging using Markov Decision Processes

Conference paper (2016) - Erwin Walraven, Matthijs T. J. Spaan

The increasing penetration of renewable energy sources and electric vehicles raises important challenges related to the operation of electricity grids. For instance, the amount of power generated by wind turbines is time-varying and dependent on the weather, which makes it hard to match flexible electric vehicle demand and uncertain wind power supply. In this paper we propose a vehicle aggregation framework which uses Markov Decision Processes to control charging of multiple electric vehicles and deals with uncertainty in renewable supply. We present a grouping technique to address the scalability aspects of our framework. In experiments we show that the aggregation framework maximizes the profit of the aggregator while reducing usage of conventionally-generated power and cost of customers. ...

Traffic flow optimization

A reinforcement learning approach

Journal article (2016) - Erwin Walraven, Matthijs T. J. Spaan, Bram Bakker

Traffic congestion causes important problems such as delays, increased fuel consumption and additional pollution. In this paper we propose a new method to optimize traffic flow, based on reinforcement learning. We show that a traffic flow optimization problem can be formulated as a Markov Decision Process. We use Q-learning to learn policies dictating the maximum driving speed that is allowed on a highway, such that traffic congestion is reduced. An important difference between our work and existing approaches is that we take traffic predictions into account. A series of simulation experiments shows that the resulting policies significantly reduce traffic congestion under high traffic demand, and that inclusion of traffic predictions improves the quality of the resulting policies. Additionally, the policies are sufficiently robust to deal with inaccurate speed and density measurements. ...

Planning under Uncertainty with Weighted State Scenarios

Conference paper (2015) - Erwin Walraven, Matthijs T. J. Spaan

In many planning domains external factors are hard to model using a compact Markovian state. However, long-term dependencies between consecutive states of an environment might exist, which can be exploited during planning. In this paper we propose a scenario representation which enables agents to reason about sequences of future states. We show how weights can be assigned to scenarios, representing the likelihood that scenarios predict future states. Furthermore, we present a model based on a Partially Observable Markov Decision Process (POMDP) to reason about state scenarios during planning. In experiments we show how scenarios and our POMDP model can be used in the context of smart grids and stock markets, and we show that our approach outperforms other methods for decision making in these domains. ...

A Scenario State Representation for Scheduling Deferrable Loads under Wind Uncertainty

Conference paper (2015) - Erwin Walraven, Matthijs T. J. Spaan

Integration of renewable energy in power systems is a potential source of uncertainty, because renewable generation is variable and may depend on changing and highly uncertain weather conditions. In this paper we present and evaluate a new method to schedule power-demanding tasks with release times and deadlines under uncertainty, in order to balance demand and uncertain supply. The problem is considered as a multi-agent sequential decision making problem where agents have to deal with uncertainty. Our main contribution is a scenario state representation and an algorithm that computes a belief over future scenarios, rather than states. The algorithm is used to recompute the belief when new information becomes available. Experiments show that our method matches demand and uncertain supply to reduce grid power consumption, and outperforms an existing online consensus scheduling algorithm. ...

Planning under Uncertainty with Weighted State Scenarios

Conference paper (2015) - Erwin Walraven, Matthijs T. J. Spaan

External factors are hard to model using a Markovian state in several real-world planning domains. Although planning can be difficult in such domains, it may be possible to exploit long-term dependencies between states of the environment during planning. We introduce weighted state scenarios to model long-term sequences of states, and we use a model based on a Partially Observable Markov Decision Process to reason about scenarios during planning. Experiments show that our model outperforms other methods for decision making in two real-world domains. ...