Safe Curriculum Learning for Linear Systems with Parametric Unknowns in Primary Flight Control

None, None; None, None; None, None

Safe Curriculum Learning for Linear Systems with Parametric Unknowns in Primary Flight Control

Conference Paper (2022)

Author(s)

D.D.C. De Buysscher (Student TU Delft)

T.S.C. Pollack (TU Delft - Control & Simulation)

Erik Jan Van Kampen (TU Delft - Control & Simulation)

Research Group

Control & Simulation

Copyright

DOI related publication

https://doi.org/10.2514/6.2022-0790

To reference this document use:

https://resolver.tudelft.nl/uuid:104c2976-037c-48b2-af3e-6fcd1ed78df7

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Research Group

Control & Simulation

ISBN (electronic)

978-1-62410-631-6

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Safe Curriculum Learning aims at improving safety and efficiency aspects of Reinforcement Learning (RL). Curricular RL approaches divide a task into stages of increasing complexity in order to increase efficiency. This paper proposes a black box safe curriculum learning architecture applicable to systems with parametric unknowns. The agent domain solely requires knowledge of the state and action spaces’ dimensions for a given task and system. By adding system identification capabilities to existing safe curriculum learning paradigms, the proposed architecture ensures safe learning of tracking tasks without requiring initial knowledge of the system dynamics. A model estimate is generated online to complement safety filters that rely on uncertain models for their safety guarantees. This research explicitly targets linearised systems with decoupled dynamics. The paradigm is initially verified on a mass-spring-damper system, after which it is applied to a quadrotor altitude and attitude tracking task. The RL agent is able to safely learn an optimal policy that can track an independent reference on each degree of freedom.

Files

6.2022_0790.pdf

(pdf | 2.75 Mb)

License info not available