UPPS-RL: Unified Predictive and Passive Safety in Quadrupedal Locomotion Control Using Reinforcement Learning

Master Thesis (2025)
Author(s)

P. Yang (TU Delft - Mechanical Engineering)

Contributor(s)

Cosimo Della Lieu – Mentor (TU Delft - Learning & Autonomous Control)

Jiatao Ding – Mentor

Arkady Zgonnikov – Graduation committee member (TU Delft - Human-Robot Interaction)

Vasso Reppa – Graduation committee member (TU Delft - Transport Engineering and Logistics)

Faculty
Mechanical Engineering
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
14-10-2025
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Vehicle Engineering | Cognitive Robotics']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Safe quadrupedal locomotion control with reinforcement learning (RL) has attracted increasing attention in recent years, where existing approaches can be broadly categorized into recovery RL, distributional RL, and constrained RL. However, recovery RL cannot provide predictive safety guarantees; distributional RL lacks passive safe performance; and constrained RL-while capable of both safety-often restricts exploration. To address these limitations, we propose \textbf{UPPS-RL}, a unified framework that integrates predictive and passive safety into quadrupedal locomotion control through three main components: a risk-aware task-level policy, a self-supervised risk network, and a risk-triggered recovery policy, forming a hierarchical control architecture that embeds unified safety without imposing explicit exploration constraints. Extensive simulations across composite scenarios, including steps, pit, slope, and rough plane terrains, demonstrate that UPPS-RL significantly suppresses catastrophic failures while maintaining a favorable trade-off between robustness and efficiency.

Files

Thesis_peiyuyang_10.9.pdf
(pdf | 17.2 Mb)
License info not available
Thesis_peiyuyang_10.9_1.pdf
(pdf | 17.2 Mb)
License info not available