Uncovering Sequential Social Dilemmas in Multi-Agent Reinforcement Learning

Challenges and Strategies for Local Energy Communities

Master Thesis (2025)
Author(s)

M.T. Okoń (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

Jochen Cremer – Mentor (TU Delft - Intelligent Electrical Power Grids)

J Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
20-02-2025
Awarding Institution
Delft University of Technology
Programme
['Computer Science']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis investigates the occurrence and mitigation of Sequential Social Dilemmas (SSDs) in Local Energy Communities (LECs) managed through Multi-agent Reinforcement Learning (MARL). LECs have great potential as pivotal elements in the green energy transition, yet the inherent conflict between individual incentives and community-wide objectives creates SSD scenarios that challenge learning processes. To address these issues, we propose an agent-centric approach and develop a custom MARL environment where agents interact via a communal battery system and a local trading mechanism.

We systematically investigate the impact of resource constraints and social interactions on the agents' learning. In non-cooperative settings, limited resources impede policy optimization, while the introduction of a shared battery reveals SSD dynamics driven by both greed and fear factors. Our experiments show that rescaling the training data leads agents to adopt more cooperative behaviors, and that reward function modifications incentivizing community-friendly battery use cause a significant increase in social welfare. These mitigation techniques are further validated in a realistic LEC environment with multiple, heterogeneous households engaging in trading and storage actions.

The contributions of this thesis are threefold: (1) the proposal of a new agent-centric MARL environment for LECs, (2) the demonstration of SSDs impacting MARL performance in these decentralized energy systems, and (3) the introduction of concrete strategies for aligning individual and community incentives.

Files

Master_s_Thesis-28.pdf
(pdf | 21.1 Mb)
License info not available