Reinforcement learning for control with probabilistic stability guarantee

A finite-sample approach

Journal Article (2026)
Author(s)

Minghao Han (Harbin Institute of Technology)

Lixian Zhang (Harbin Institute of Technology)

Chenliang Liu (Central South University China)

Zhipeng Zhou (TU Delft - Mechanical Engineering)

Jun Wang (University College London)

Wei Pan (The University of Manchester)

Research Group
Robust Robot Systems
DOI related publication
https://doi.org/10.1016/j.automatica.2026.112964 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Robust Robot Systems
Journal title
Automatica
Volume number
188
Article number
112964
Downloads counter
7
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a novel approach to reinforcement learning (RL) for control systems that provides probabilistic stability guarantees using finite data. Leveraging Lyapunov's method, we propose a probabilistic stability theorem that ensures mean square stability using only a finite number of sampled trajectories. The probability of stability increases with the number and length of trajectories, converging to certainty as data size grows. Additionally, we derive a policy gradient theorem for stabilizing policy learning and develop an RL algorithm, L-REINFORCE, that extends the classical REINFORCE algorithm to stabilization problems. The effectiveness of L-REINFORCE is demonstrated through simulations on a Cartpole task, where it outperforms the baseline in ensuring stability. This work bridges a critical gap between RL and control theory, enabling stability analysis and controller design in a model-free framework with finite data.

Files

Taverne
warning

File under embargo until 04-10-2026