Reinforcement learning for control with probabilistic stability guarantee

None, None; None, None; None, None; None, None; None, None; None, None

Reinforcement learning for control with probabilistic stability guarantee

A finite-sample approach

Journal Article (2026)

Author(s)

Minghao Han (Harbin Institute of Technology)

Lixian Zhang (Harbin Institute of Technology)

Chenliang Liu (Central South University China)

Zhipeng Zhou (TU Delft - Mechanical Engineering)

Jun Wang (University College London)

Wei Pan (The University of Manchester)

Research Group

Robust Robot Systems

Reinforcement learning Nonlinear control Lyapunov's method Finite sample Probabilistic bound

DOI related publication

https://doi.org/10.1016/j.automatica.2026.112964 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:490aa8e3-aefb-49e4-a86e-9a21af5b77d2

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Robust Robot Systems

Journal title

Automatica

Volume number

188

Article number

112964

Downloads counter

15

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This paper presents a novel approach to reinforcement learning (RL) for control systems that provides probabilistic stability guarantees using finite data. Leveraging Lyapunov's method, we propose a probabilistic stability theorem that ensures mean square stability using only a finite number of sampled trajectories. The probability of stability increases with the number and length of trajectories, converging to certainty as data size grows. Additionally, we derive a policy gradient theorem for stabilizing policy learning and develop an RL algorithm, L-REINFORCE, that extends the classical REINFORCE algorithm to stabilization problems. The effectiveness of L-REINFORCE is demonstrated through simulations on a Cartpole task, where it outperforms the baseline in ensuring stability. This work bridges a critical gap between RL and control theory, enabling stability analysis and controller design in a model-free framework with finite data.

Files

1-s2.0-S0005109826001482-main.... (pdf)

(pdf | 2.03 Mb)

Taverne

File under embargo until 04-10-2026