A hybrid curriculum learning and tree search approach for network topology control

Journal Article (2025)
Author(s)

G.J. Meppelink (Student TU Delft)

A. Rajaei (TU Delft - Intelligent Electrical Power Grids)

J.L. Cremer (TU Delft - Intelligent Electrical Power Grids)

Research Group
Intelligent Electrical Power Grids
DOI related publication
https://doi.org/10.1016/j.epsr.2025.111455
More Info
expand_more
Publication Year
2025
Language
English
Research Group
Intelligent Electrical Power Grids
Volume number
242
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Transmission network topology control offers cheap flexibility to system operators for mitigating grid congestion. However, finding the optimal sequence of topology actions remains a challenge due to the large number of possible actions. Although reinforcement learning (RL) approaches have attracted interest for long-term planning in large combinatorial action spaces, they encounter challenges such as training stability, sample efficiency, and unforeseen consequences of RL actions. Addressing these challenges, this paper proposes a hybrid curriculum-trained RL and Monte Carlo tree search (MCTS) approach to determine sequential topological actions for mitigating grid congestion. The curriculum-based approach stabilizes training by first pre-training a policy network through supervised imitation learning, followed by RL training. The policy network guides the MCTS to simulate promising future trajectories, mitigating unforeseen consequences and identifying long-term strategies to improve grid security. Moreover, the MCTS-verified actions are used for RL training, enhancing sample efficiency and training time. A distance factor is added to the MCTS, which improves convergence by prioritizing actions closer to congestion. Numerical results on the IEEE 118-bus system show that the proposed hybrid approach improves the timesteps survived by 30% compared to a standard RL approach, and by 5% compared to a brute-force baseline. Additionally, the inclusion of the distance factor increases the timesteps survived by 15%. These results highlight the potential of the proposed method for real-world applications of using sequential topological actions to effectively relieve grid congestion.