Regret Analysis of Learning-Based Linear Quadratic Gaussian Control with Additive Exploration

Master Thesis (2023)
Author(s)

A. ATHREY (TU Delft - Mechanical Engineering)

Contributor(s)

BHK Schutter – Mentor (TU Delft - Delft Center for Systems and Control)

S. Shi – Coach (TU Delft - Team Bart De Schutter)

Mohammad Khosravi – Graduation committee member (TU Delft - Team Khosravi)

Othmane Mazhar – Graduation committee member (KTH Royal Institute of Technology)

Faculty
Mechanical Engineering
Copyright
© 2023 ARCHITH ATHREY
More Info
expand_more
Publication Year
2023
Language
English
Copyright
© 2023 ARCHITH ATHREY
Graduation Date
30-10-2023
Awarding Institution
Delft University of Technology
Programme
['Mechanical Engineering | Systems and Control']
Faculty
Mechanical Engineering
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This thesis addresses the Learning-Based Control (LBC) of unknown partially observable systems in the Linear Quadratic (LQ) paradigm. In this setting of learning-based LQ control, the control action influences not only the control performance but also the rate at which the system is being learnt, causing a conflict between learning and control (exploration and exploitation), which is particularly challenging to address. This thesis aims to develop a novel LBC algorithm for unknown partially observable systems in the LQG setting that is computationally efficient and can guarantee an optimal exploration-exploitation trade-off, quantified by a metric called regret. The regret quantifies the cumulative performance gap between the LBC policy and the ideal controller having full knowledge of the true system dynamics. The contributions in this thesis involve a novel LBC algorithm deployed in a two-phase structure. The first phase involves injecting Gaussian input signals to obtain an initial system model. The subsequent second phase deploys the proposed LBC strategy in an episodic setting, where the model is updated for each episode, and the resulting updated LQG controller is applied with additive Gaussian signals for exploration. In addition, the thesis establishes strong theoretical guarantees on optimal regret growth.

Files

License info not available