DistFlow Safe Reinforcement Learning Algorithm for Voltage Magnitude Regulation in Distribution Networks

Journal Article (2025)
Author(s)

Shengren Hou (TU Delft - Intelligent Electrical Power Grids)

Aihui Fu (TU Delft - Intelligent Electrical Power Grids)

Edgar Mauricio Salazar Salazar (Eindhoven University of Technology)

Peter Palensky (TU Delft - Electrical Sustainable Energy)

Qixin Chen (Tsinghua University)

P.P. Vergara Barrios (TU Delft - Intelligent Electrical Power Grids)

Research Group
Intelligent Electrical Power Grids
DOI related publication
https://doi.org/10.35833/MPCE.2024.000253
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Intelligent Electrical Power Grids
Issue number
1
Volume number
13
Pages (from-to)
300-311
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The integration of distributed energy resources (DERs) has escalated the challenge of voltage magnitude regulation in distribution networks. Model-based approaches, which rely on complex sequential mathematical formulations, cannot meet the real-time demand. Deep reinforcement learning (DRL) offers an alternative by utilizing offline training with distribution network simulators and then executing online without computation. However, DRL algorithms fail to enforce voltage magnitude constraints during training and testing, potentially leading to serious operational violations. To tackle these challenges, we introduce a novel safe-guaranteed reinforcement learning algorithm, the DistFlow safe reinforcement learning (DF-SRL), designed specifically for real-time voltage magnitude regulation in distribution networks. The DF-SRL algorithm incorporates a DistFlow linearization to construct an expert-knowledge-based safety layer. Subsequently, the DF-SRL algorithm overlays this safety layer on top of the agent policy, recalibrating unsafe actions to safe domains through a quadratic programming formulation. Simulation results show the DF-SRL algorithm consistently ensures voltage magnitude constraints during training and real-time operation (test) phases, achieving faster convergence and higher performance, which differentiates it apart from (safe) DRL benchmark algorithms.