Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

None, None

Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

Master Thesis (2023)

Author(s)

A. ATHREY (TU Delft - Mechanical Engineering)

Contributor(s)

B. De Schutter – Mentor (TU Delft - Delft Center for Systems and Control)

S. Shi – Coach (TU Delft - Team Bart De Schutter)

M. Khosravi – Graduation committee member (TU Delft - Team Khosravi)

Othmane Mazhar – Graduation committee member (KTH Royal Institute of Technology)

Faculty

Mechanical Engineering

Copyright

Reinforcement Learning (RL) Linear Quadratic Gaussian Naive exploration

To reference this document use:

https://resolver.tudelft.nl/uuid:6c47623c-e3bf-48b1-b16b-a06155740df9

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Graduation Date

30-10-2023

Awarding Institution

Delft University of Technology

Programme

['Mechanical Engineering | Systems and Control']

Faculty

Mechanical Engineering

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis addresses the Learning-Based Control (LBC) of unknown partially observable systems in the Linear Quadratic (LQ) paradigm. In this setting of learning-based LQ control, the control action influences not only the control performance but also the rate at which the system is being learnt, causing a conflict between learning and control (exploration and exploitation), which is particularly challenging to address. This thesis aims to develop a novel LBC algorithm for unknown partially observable systems in the LQG setting that is computationally efficient and can guarantee an optimal exploration-exploitation trade-off, quantified by a metric called regret. The regret quantifies the cumulative performance gap between the LBC policy and the ideal controller having full knowledge of the true system dynamics. The contributions in this thesis involve a novel LBC algorithm deployed in a two-phase structure. The first phase involves injecting Gaussian input signals to obtain an initial system model. The subsequent second phase deploys the proposed LBC strategy in an episodic setting, where the model is updated for each episode, and the resulting updated LQG controller is applied with additive Gaussian signals for exploration. In addition, the thesis establishes strong theoretical guarantees on optimal regret growth.

Files

Msc_thesis_report_Archith_Athr... (pdf)

(pdf | 4.75 Mb)

License info not available