F. de Nijs | TU Delft Repository

Constrained multiagent Markov decision processes

A taxonomy of problems and algorithms

Journal article (2021) - Frits de Nijs, Erwin Walraven, Mathijs M. de Weerdt, Matthijs T.J. Spaan

In domains such as electric vehicle charging, smart distribution grids and autonomous warehouses, multiple agents share the same resources. When planning the use of these resources, agents need to deal with the uncertainty in these domains. Although several models and algorithms for such constrained multiagent planning problems under uncertainty have been proposed in the literature, it remains unclear when which algorithm can be applied. In this survey we conceptualize these domains and establish a generic problem class based on Markov decision processes. We identify and compare the conditions under which algorithms from the planning literature for problems in this class can be applied: whether constraints are soft or hard, whether agents are continuously connected, whether the domain is fully observable, whether a constraint is momentarily (instantaneous) or on a budget, and whether the constraint is on a single resource or on multiple. Further we discuss the advantages and disadvantages of these algorithms. We conclude by identifying open problems that are directly related to the conceptualized domains, as well as in adjacent research areas. ...

Resource-constrained Multi-agent Markov Decision Processes

Doctoral thesis (2019) - Frits de Nijs

Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid. This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations. ...

Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid. This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations.

Multi-agent Planning Under Uncertainty for Capacity Management

Book chapter (2019) - Frits de Nijs, Mathijs M. de Weerdt, Matthijs T. J. Spaan

Demand response refers to the concept that power consumption should aim to match supply, instead of supply following demand. It is a key technology to enable the successful transition to an electricity system that incorporates more and more intermittent and uncontrollable renewable energy sources. For instance, loads such as heat pumps or charging of electric vehicles are potentially flexible and could be shifted in time to take advantage of renewable generation. Load shifting is most effective, however, when it is performed in a coordinated fashion to avoid merely shifting the peak instead of flattening it. In this chapter, we discuss multi-agent planning algorithms for capacity management to address this issue. Our methods focus in particular on addressing the challenges that result from the need to plan ahead into the future given uncertainty in supply and demand. We demonstrate that by decoupling the interactions of agents with the constraint, the resulting algorithms are able to compute effective demand response policies for hundreds of agents. ...

Capacity-aware Sequential Recommendations

Conference paper (2018) - Frits de Nijs, Georgios Theocharous, Nikos Vlassis, Mathijs M. de Weerdt, Matthijs T.J. Spaan

Personalized recommendations are increasingly important to engage users and guide them through large systems, for example when recommending points of interest to tourists visiting a popular city. To maximize long-term user experience, the system should consider issuing recommendations sequentially, since by observing the user's response to a recommendation, the system can update its estimate of the user's (latent) interests. However, as traditional recommender systems target individuals, their effect on a collective of users can unintentionally overload capacity. Therefore, recommender systems should not only consider the users' interests, but also the effect of recommendations on the available capacity.

The structure in such a constrained, multi-agent, partially observable decision problem can be exploited by a novel belief-space sampling algorithm which bounds the size of the state space by a limit on regret. By exploiting the stationary structure of the problem, our algorithm is significantly more scalable than existing approximate solvers. Moreover, by explicitly considering the information value of actions, this algorithm significantly improves the quality of recommendations over an extension of posterior sampling reinforcement learning to the constrained multi-agent case. We show how to decouple constraint satisfaction from sequential recommendation policies, resulting in algorithms which issue recommendations to thousands of agents while respecting constraints. ...

Preallocation and Planning under Stochastic Resource Constraints

Conference paper (2018) - Frits de Nijs, Matthijs T.J. Spaan, Mathijs M. de Weerdt

Resource constraints frequently complicate multi-agent planning problems. Existing algorithms for resource-constrained, multi-agent planning problems rely on the assumption that the constraints are deterministic. However, frequently resource constraints are themselves subject to uncertainty from external influences. Uncertainty about constraints is especially challenging when agents must execute in an environment where communication is unreliable, making on-line coordination difficult. In those cases, it is a significant challenge to find coordinated allocations at plan time depending on availability at run time. To address these limitations, we propose to extend algorithms for constrained multi-agent planning problems to handle stochastic resource constraints. We show how to factorize resource limit uncertainty and use this to develop novel algorithms to plan policies for stochastic constraints. We evaluate the algorithms on a search-and-rescue problem and on a power-constrained planning domain where the resource constraints are decided by nature. We show that plans taking into account all potential realizations of the constraint obtain significantly better utility than planning for the expectation, while causing fewer constraint violations. ...

Bounding the probability of resource constraint violations in multi-agent MDPs

Conference paper (2017) - Frits De Nijs, Erwin Walraven, Mathijs M. De Weerdt, Matthijs T.J. Spaan

Multi-agent planning problems with constraints on global resource consumption occur in several domains. Existing algorithms for solving Multi-agent Markov Decision Processes can compute policies that meet a resource constraint in expectation, but these policies provide no guarantees on the probability that a resource constraint violation will occur. We derive a method to bound constraint violation probabilities using Hoeffding's inequality. This method is applied to two existing approaches for computing policies satisfying constraints: the Constrained MDP framework and a Column Generation approach. We also introduce an algorithm to adaptively relax the bound up to a given maximum violation tolerance. Experiments on a hard toy problem show that the resulting policies outperform static optimal resource allocations to an arbitrary level. By testing the algorithms on more realistic planning domains from the literature, we demonstrate that the adaptive bound is able to efficiently trade off violation probability with expected value, outperforming state-of-the-art planners. ...

Decoupling a Resource Constraint through Fictitious Play in Multi-agent Sequential Decision Making

Conference paper (2016) - Frits de Nijs, Matthijs T. J. Spaan, M.M. de Weerdt

When multiple independent agents use a limited shared resource, they need to coordinate and thereby their planning problems become coupled. We present a resource assignment strategy that decouples agents using marginal utility cost, allowing them to plan individually. We show that agents converge to an expected cost curve by keeping a history of plans, inspired by fictitious play. This performs slightly better than a state-of-the-art best-response approach and is significantly more scalable than a preallocation Mixed-Integer Linear Programming formulation, providing a good trade-off between performance and quality. ...

Dynamic capacity control and balancing in the medium voltage grid (doctoral consortium)

Abstract (2015) - Frits De Nijs, Matthijs T J Spaan, Mathijs M. De Weerdt

Renewable power sources such as wind and solar are inflexible in their energy production, which requires demand to rapidly follow supply in order to maintain energy balance. Promising controllable demands are air-conditioners and heat pumps which use electric energy to maintain a temperature at a setpoint. Such Thermostatically Controlled Loads (TCLs) have been shown to be able to follow a power curve using reactive control. In this project we investigate the use of planning under uncertainty to pro-actively control an aggregation of TCLs to overcome temporary grid imbalance. We model the planning problem under consideration using the Multi-Agent Markov Decision Process (MMDP) framework. Since we are dealing with hundreds of agents, solving the resulting MMDPs directly is intractable. Instead, we propose to research ways to decompose the problem by decoupling the interactions. ...

Best-response planning of thermostatically controlled loads under power constraints

Conference paper (2015) - Frits De Nijs, Matthijs T J Spaan, Mathys M. De Weerdt

Renewable power sources such as wind and solar are inflexible in their energy production, which requires demand to rapidly follow supply in order to maintain energy balance. Promising controllable demands are air-conditioners and heat pumps which use electric energy to maintain a temperature at a setpoint. Such Thermostatically Controlled Loads (TCLs) have been shown to be able to follow a power curve using reactive control. In this paper we investigate the use of planning under uncertainty to pro-actively control an aggregation of TCLs to overcome temporary grid imbalance. We present a formal definition of the planning problem under consideration, which we model using the Multi-Agent Markov Decision Process (MMDP) framework. Since we are dealing with hundreds of agents, solving the resulting MMDPs directly is intractable. Instead, we propose to decompose the problem by decoupling the interactions through arbitrage. Decomposition of the problem means relaxing the joint power consumption constraint, which means that joining the plans together can cause overconsumption. Arbitrage acts as a conflict resolution mechanism during policy execution, using the future expected value of policies to determine which TCLs should receive the available energy. We experimentally compare several methods to plan with arbitrage, and conclude that a best response-like mechanism is a scalable approach that returns near-optimal solutions. ...

Efficient heuristics for power constrained planning of thermostatically controlled loads

Conference paper (2014) - Frits De Nijs, Mathijs M. De Weerdt, Matthijs T.J. Spaan