Safe Reinforcement Learning for V2G-Enabled Electric Vehicle Aggregators

None, None; None, None; None, None

Safe Reinforcement Learning for V2G-Enabled Electric Vehicle Aggregators

Conference Paper (2026)

Author(s)

Ruben Eland (Student TU Delft)

S. Orfanoudakis (TU Delft - Electrical Engineering, Mathematics and Computer Science)

P.P. Vergara Barrios (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group

Intelligent Electrical Power Grids

EV Smart Charging Safe Reinforcement Learning EV Aggregators

DOI related publication

https://doi.org/10.1007/978-3-032-19102-1_5 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:a58d9f40-cddc-4dde-bb8b-e99ea1bfa8a8

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Intelligent Electrical Power Grids

Pages (from-to)

76-91

Publisher

Springer Science and Business Media Deutschland GmbH

ISBN (print)

978-3-032-19101-4

ISBN (electronic)

978-3-032-19102-1

Event

European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2025 (2025-09-15 - 2025-09-19), Porto, Portugal

Downloads counter

12

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The increasing penetration of Electric Vehicles (EVs) and renewable energy sources is placing significant stress on existing power grid infrastructure. This work investigates the application of vehicle-to-grid (V2G)-enabled smart charging in workplace environments from the perspective of EV aggregators, using real-world charging data from Dutch business parking lots. To address the limitations of conventional deep Reinforcement Learning (RL) methods in enforcing operational constraints, we propose a Safe RL method using the Constrained Variational Policy Optimization (CVPO) algorithm, specifically designed to reduce constraint violations and enhance reliability. Empirical results show that CVPO outperforms classic RL baselines and rule-based policies, closely approximating the performance of an optimal offline benchmark while exhibiting strong generalization to unseen scenarios.

Files

978-3-032-19102-1_5.pdf

(pdf | 1.17 Mb)

Taverne

File under embargo until 10-11-2026