Uncovering Sequential Social Dilemmas in Multi-Agent Reinforcement Learning

None, None

Uncovering Sequential Social Dilemmas in Multi-Agent Reinforcement Learning

Challenges and Strategies for Local Energy Communities

Master Thesis (2025)

Author(s)

M.T. Okoń (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

L. Cavalcante Siebert – Mentor (TU Delft - Interactive Intelligence)

Jochen Cremer – Mentor (TU Delft - Intelligent Electrical Power Grids)

J Yang – Graduation committee member (TU Delft - Web Information Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Local Energy Communities Multi-Agent Reinforcement Learning Sequential Social Dilemmas Reinforcement Learning in Energy Systems

To reference this document use:

https://resolver.tudelft.nl/uuid:23f1c267-e5f8-4086-917c-66951416ec3d

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

20-02-2025

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis investigates the occurrence and mitigation of Sequential Social Dilemmas (SSDs) in Local Energy Communities (LECs) managed through Multi-agent Reinforcement Learning (MARL). LECs have great potential as pivotal elements in the green energy transition, yet the inherent conflict between individual incentives and community-wide objectives creates SSD scenarios that challenge learning processes. To address these issues, we propose an agent-centric approach and develop a custom MARL environment where agents interact via a communal battery system and a local trading mechanism.

We systematically investigate the impact of resource constraints and social interactions on the agents' learning. In non-cooperative settings, limited resources impede policy optimization, while the introduction of a shared battery reveals SSD dynamics driven by both greed and fear factors. Our experiments show that rescaling the training data leads agents to adopt more cooperative behaviors, and that reward function modifications incentivizing community-friendly battery use cause a significant increase in social welfare. These mitigation techniques are further validated in a realistic LEC environment with multiple, heterogeneous households engaging in trading and storage actions.

The contributions of this thesis are threefold: (1) the proposal of a new agent-centric MARL environment for LECs, (2) the demonstration of SSDs impacting MARL performance in these decentralized energy systems, and (3) the introduction of concrete strategies for aligning individual and community incentives.

Files

Master_s_Thesis-28.pdf

(pdf | 21.1 Mb)

License info not available