Reinforcement learning for control with probabilistic stability guarantee

A finite-sample approach

Journal Article (2026)
Author(s)

Minghao Han (Harbin Institute of Technology)

Lixian Zhang (Harbin Institute of Technology)

Chenliang Liu (Central South University China)

Zhipeng Zhou (TU Delft - Mechanical Engineering)

Jun Wang (University College London)

Wei Pan (The University of Manchester)

Research Group
Robust Robot Systems
DOI related publication
https://doi.org/10.1016/j.automatica.2026.112964 Final published version
More Info
expand_more
Publication Year
2026
Language
English
Research Group
Robust Robot Systems
Journal title
Automatica
Volume number
188
Article number
112964
Downloads counter
15
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a novel approach to reinforcement learning (RL) for control systems that provides probabilistic stability guarantees using finite data. Leveraging Lyapunov's method, we propose a probabilistic stability theorem that ensures mean square stability using only a finite number of sampled trajectories. The probability of stability increases with the number and length of trajectories, converging to certainty as data size grows. Additionally, we derive a policy gradient theorem for stabilizing policy learning and develop an RL algorithm, L-REINFORCE, that extends the classical REINFORCE algorithm to stabilization problems. The effectiveness of L-REINFORCE is demonstrated through simulations on a Cartpole task, where it outperforms the baseline in ensuring stability. This work bridges a critical gap between RL and control theory, enabling stability analysis and controller design in a model-free framework with finite data.

Files

Taverne
warning

File under embargo until 04-10-2026