From Supervised to Reinforcement Learning: an Inverse Optimization Approach
I. Dimanidis (TU Delft - Mechanical Engineering)
P. Mohajerin Esfahani – Mentor (TU Delft - Team Bart De Schutter)
M. Mazo Espinosa – Graduation committee member (TU Delft - Team Manuel Mazo Jr)
B. Atasoy – Graduation committee member (TU Delft - Transport Engineering and Logistics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
We propose a novel method combining elements of supervised- and Q-learning for the control of dynamical systems subject to unknown disturbances. By using the Inverse Optimization framework and in-hindsight information we can derive a causal parametric optimization policy that approximates a non-causal MPC expert. Furthermore, we propose a new min-max MPC scheme that robustifies against a ball around a disturbance trajectory. This scheme yields an exact convex reformulation using the S-Lemma, and is also approximated using Inverse Optimization. Finally, simulation studies clarify and verify our approach.