Z. Zhou

info

Please Note

<p>This page displays the records of the person named above and is not linked to a unique person identifier. This record may need to be merged to a profile.</p>

Conference paper (1)

Journal article (1)

2 records found

Reinforcement learning for control with probabilistic stability guarantee

A finite-sample approach

Journal article (2026) - Minghao Han, Lixian Zhang, Chenliang Liu, Zhipeng Zhou, Jun Wang, Wei Pan

This paper presents a novel approach to reinforcement learning (RL) for control systems that provides probabilistic stability guarantees using finite data. Leveraging Lyapunov's method, we propose a probabilistic stability theorem that ensures mean square stability using only a finite number of sampled trajectories. The probability of stability increases with the number and length of trajectories, converging to certainty as data size grows. Additionally, we derive a policy gradient theorem for stabilizing policy learning and develop an RL algorithm, L-REINFORCE, that extends the classical REINFORCE algorithm to stabilization problems. The effectiveness of L-REINFORCE is demonstrated through simulations on a Cartpole task, where it outperforms the baseline in ensuring stability. This work bridges a critical gap between RL and control theory, enabling stability analysis and controller design in a model-free framework with finite data. ...

Reinforcement Learning for Orientation Estimation Using Inertial Sensors with Performance Guarantee

Conference paper (2021) - Liang Hu, Yujie Tang, Zhipeng Zhou, Wei Pan

This paper presents a deep reinforcement learning (DRL) algorithm for orientation estimation using inertial sensors combined with a magnetometer. Lyapunov’s method in control theory is employed to prove the convergence of orientation estimation errors. The estimator gains and a Lyapunov function are parametrised by deep neural networks and learned from samples based on the theoretical results. The DRL estimator is compared with three well-known orientation estimation methods on both numerical simulations and real dataset collected from commercially available sensors. The results show that the proposed algorithm is superior for arbitrary estimation initialisation and can adapt to a drastic angular velocity profile for which other algorithms can be hardly applicable. To the best of our knowledge, this is the first DRL-based orientation estimation method with an estimation error boundedness guarantee. ...