Offline Reinforcement Learning via Inverse Optimization

None, None; None, None; None, None

Offline Reinforcement Learning via Inverse Optimization

Journal Article (2026)

Author(s)

Ioannis Dimanidis (Student TU Delft)

Tolga Ok (TU Delft - Mechanical Engineering)

Peyman Mohajerin Esfahani (TU Delft - Mechanical Engineering, University of Toronto)

Research Group

Team Peyman Mohajerin Esfahani

To reference this document use

https://resolver.tudelft.nl/uuid:9bb0b1d3-8e40-4189-80e0-da94b7347f95

More Info

expand_more

Publication Year

2026

Language

English

Research Group

Team Peyman Mohajerin Esfahani

Journal title

Transactions on Machine Learning Research

Issue number

03

Volume number

2026

Downloads counter

30

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called “sub-optimality loss” from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and reliably recovers teacher behavior in MuJoCo benchmarks. The method achieves competitive results compared to widely-used baselines in sample-constrained settings, despite using orders of magnitude fewer parameters. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.

Files

6206_Offline_Reinforcement_Lea... (pdf)

(pdf | 1.09 Mb)