Model-Free Deep Reinforcement Learning Control for Grid-Connected Packed U-Cell Multilevel Inverters
A.N. Alquennah (Texas A&M University)
T. Zamzam (Texas A&M University, Texas A&M University at Qatar)
A. Kouzou (Texas A&M University, Texas A&M University at Qatar)
A. Kermansaravi (TU Delft - Intelligent Electrical Power Grids)
M. Trabelsi (Kuwait College of Science and Technology)
S. Bayhan (Hamad Bin Khalifa University)
H. Abu-Rub (Hamad Bin Khalifa University)
A. Ghrayeb (Hamad Bin Khalifa University)
H. Vahedi (Abdullah Al Salem University, TU Delft - DC systems, Energy conversion & Storage)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper proposes an innovative model-free deep reinforcement learning-based controller (RL-C) for a grid-connected 5-level packed-U-cell (PUC5) multilevel inverter (MLI). The controller is designed to deliver a high-quality grid current while maintaining the PUC5 floating capacitor voltage at its reference level. In addition, the proposed controller supports both active and reactive power exchanges, adapts to variations in voltage and current references, and remains robust under grid voltage variations. The RL agent learns optimal switching actions through direct interaction with the PUC5 system, eliminating the need for data collection or reliance on existing control models. An Actor-Critic architecture is adopted, and the Proximal Policy Optimization (PPO) algorithm is applied for training (offline) using MATLAB/Simulink, where the RL-C is evaluated under diverse PUC5 configurations and operating conditions in the testing phase. The trained agent has been implemented on an Opal-RT real-time system and validated experimentally using a laboratory-made PUC5 prototype. The performance of the proposed RL-C approach is compared to both traditional approaches including finite control set model predictive control, sliding mode control, and PI control, and other state-of-the-art RL algorithms, demonstrating superior generalization and training efficiency. Moreover, a sensitivity analysis quantifying the impact of reward design, state space, network size, and key hyperparameters on convergence and performance is carried out.