Non-stationarity in multiagent reinforcement learning in electricity market simulation

Journal Article (2024)
Author(s)

Charles Renshaw-Whitman (Student TU Delft)

Viktor Zobernig (Austrian Institute of Technology, TU Delft - Technology, Policy and Management)

Jochen L. Cremer (TU Delft - Electrical Engineering, Mathematics and Computer Science, Austrian Institute of Technology)

Laurens de Vries (TU Delft - Technology, Policy and Management)

Research Group
Energy and Industry
DOI related publication
https://doi.org/10.1016/j.epsr.2024.110712 Final published version
More Info
expand_more
Publication Year
2024
Language
English
Research Group
Energy and Industry
Journal title
Electric Power Systems Research
Volume number
235
Article number
110712
Downloads counter
338
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The design of electricity markets may be facilitated by simulating actors’ behaviors. Recent studies model human decision-makers within markets as agents which learn strategies that maximize expected profits. This work investigates the problem of ‘non-stationarity’ in the context of market simulations, a problem with the learning-algorithms used by such studies which results in agents behaving irrationally, thus limiting the studies’ applicability to real-world strategic behavior. Isolating the source of the problem for a day-ahead electricity market, this paper proposes methods which meliorate this problem in simple test-cases, and proves requirements under which ‘centralized-training, decentralized-execution’ value-learning methods will converge to correct behavior in general. Subsequently, this paper proposes a framework for ‘adversarial market design’ that includes the market-designer as an agent. This allows the optimization of market-designs subject to possibly strategic behavior of participating firms — in turn enabling the automated selection of the optimal market from any set of markets.