Offline Reinforcement Learning via Inverse Optimization

Journal Article (2026)
Author(s)

Ioannis Dimanidis (Student TU Delft)

Tolga Ok (TU Delft - Team Peyman Mohajerin Esfahani)

Peyman Mohajerin Esfahani (TU Delft - Team Peyman Mohajerin Esfahani, University of Toronto)

More Info
expand_more
Publication Year
2026
Language
English
Journal title
Transactions on Machine Learning Research
Issue number
03
Volume number
2026
Downloads counter
14
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Inspired by the recent successes of Inverse Optimization (IO) across various application domains, we propose a novel offline Reinforcement Learning (ORL) algorithm for continuous state and action spaces, leveraging the convex loss function called “sub-optimality loss” from the IO literature. To mitigate the distribution shift commonly observed in ORL problems, we further employ a robust and non-causal Model Predictive Control (MPC) expert steering a nominal model of the dynamics using in-hindsight information stemming from the model mismatch. Unlike the existing literature, our robust MPC expert enjoys an exact and tractable convex reformulation. In the second part of this study, we show that the IO hypothesis class, trained by the proposed convex loss function, enjoys ample expressiveness and reliably recovers teacher behavior in MuJoCo benchmarks. The method achieves competitive results compared to widely-used baselines in sample-constrained settings, despite using orders of magnitude fewer parameters. To facilitate the reproducibility of our results, we provide an open-source package implementing the proposed algorithms and the experiments.