L.D. Steenhoff

Master thesis (1)

1 records found

Distributed Residual Deep Reinforcement Learning for Load Frequency Control

Master thesis (2025) - L.D. Steenhoff (author) , A. Riccardi (mentor) , B.H.K. Schutter (mentor) , Mohamad Amin Sharifi Kolarijani (graduation committee member)

The increasing integration of renewable energy sources into power systems, characterized by their variability and inherent lack of inertia, presents significant challenges for the load frequency control problem, as large frequency fluctuations can cause equipment damage or even blackouts. Additionally, the large geographical size and complexity of today’s power systems require a multi-agent control strategy that is scalable and computationally efficient for real-time control.
This thesis proposes two novel control structures that integrate decentralized Model Predictive Control (MPC) with residual reinforcement learning based on the Deep Deterministic Policy Gradient (DDPG) algorithm. In the first structure, each area is controlled by a decentralized MPC, and a centralized coordinating residual DDPG layer is added on top. In the second structure, the decentralized MPC layer is combined with a distributed coordinating residual DDPG layer. The second structure is more scalable, but limits every DDPG controller to partial system observability. To effectively test and evaluate the novel control strategies, the European Economic Area Electricity Network Benchmark (EEA-ENB) is used.
Both structures share four key ideas: 1) Due to the coupling of areas in the EEA-ENB, the resulting power system is unstable, making it difficult to train the DDPG agent. To overcome this, the decentralized MPC layer enforces baseline stability, enabling the DDPG agent to learn a residual input to improve coordination between areas. 2) The coordinating DDPG layer is trained offline, shifting the computational burden away from online control. 3) By providing a meaningful baseline, decentralized MPC removes the need for the DDPG agent to learn entirely from scratch, which increases the sample efficiency. 4) The baseline allows the DDPG agent to learn in a smaller action space, which reduces exploration difficulties and improves the accuracy.
The proposed structures are compared against the centralized MPC, distributed MPC based on the alternating direction method of multipliers, and decentralized MPC in a four and six-area case study. Simulation results demonstrate that the coordinating residual input from the centralized or distributed DDPG layer significantly reduces the performance gap between decentralized MPC and the optimal centralized MPC solution. The centralized DDPG layer reduces the gap by 69.0% in the four-area case and 76.3% in the six-area case, while the distributed variant achieves reductions of 49.5% and 87.3%, respectively. Although the performance of the developed control structures does not fully match that of distributed MPC, the computational cost is at least 15 times lower. The distributed DDPG variant requires longer offline training time compared to the centralized DDPG method, but improves the scalability. To fully validate their performance and scalability, future work should implement the control structures on the entire EEA-ENB network.