Model-Free Deep Reinforcement Learning Control for Grid-Connected Packed U-Cell Multilevel Inverters

None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None; None, None

Model-Free Deep Reinforcement Learning Control for Grid-Connected Packed U-Cell Multilevel Inverters

Journal Article (2026)

Author(s)

A.N. Alquennah (Texas A&M University)

T. Zamzam (Texas A&M University, Texas A&M University at Qatar)

A. Kouzou (Texas A&M University, Texas A&M University at Qatar)

A. Kermansaravi (TU Delft - Intelligent Electrical Power Grids)

M. Trabelsi (Kuwait College of Science and Technology)

S. Bayhan (Hamad Bin Khalifa University)

H. Abu-Rub (Hamad Bin Khalifa University)

A. Ghrayeb (Hamad Bin Khalifa University)

H. Vahedi (Abdullah Al Salem University, TU Delft - DC systems, Energy conversion & Storage)

Research Group

DC systems, Energy conversion & Storage

Reinforcement Learning Packed-U-Cell Inverter AI controllers

DOI related publication

https://doi.org/10.1109/OJPEL.2026.3684829 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:2195f7df-57f7-4418-912e-b3dd2138dc0a

More Info

expand_more

Publication Year

2026

Language

English

Research Group

DC systems, Energy conversion & Storage

Journal title

IEEE Open Journal of Power Electronics

Volume number

7

Pages (from-to)

1360-1376

Downloads counter

8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper proposes an innovative model-free deep reinforcement learning-based controller (RL-C) for a grid-connected 5-level packed-U-cell (PUC5) multilevel inverter (MLI). The controller is designed to deliver a high-quality grid current while maintaining the PUC5 floating capacitor voltage at its reference level. In addition, the proposed controller supports both active and reactive power exchanges, adapts to variations in voltage and current references, and remains robust under grid voltage variations. The RL agent learns optimal switching actions through direct interaction with the PUC5 system, eliminating the need for data collection or reliance on existing control models. An Actor-Critic architecture is adopted, and the Proximal Policy Optimization (PPO) algorithm is applied for training (offline) using MATLAB/Simulink, where the RL-C is evaluated under diverse PUC5 configurations and operating conditions in the testing phase. The trained agent has been implemented on an Opal-RT real-time system and validated experimentally using a laboratory-made PUC5 prototype. The performance of the proposed RL-C approach is compared to both traditional approaches including finite control set model predictive control, sliding mode control, and PI control, and other state-of-the-art RL algorithms, demonstrating superior generalization and training efficiency. Moreover, a sensitivity analysis quantifying the impact of reward design, state space, network size, and key hyperparameters on convergence and performance is carried out.

Files

Model-Free_Deep_Reinforcement_... (pdf)

(pdf | 6.28 Mb)