Data-driven Inverse Optimization (IO) is a form of Supervised Learning where it is assumed
that the output data is found by means of an optimization problem that depends on the input
data. IO uses this data to approximate the optimization problem as best as possible. In t
...
Data-driven Inverse Optimization (IO) is a form of Supervised Learning where it is assumed
that the output data is found by means of an optimization problem that depends on the input
data. IO uses this data to approximate the optimization problem as best as possible. In the
case where one wants to emulate an expert operating in a dynamic environment, the dataset
obtained by measuring the expert is often contained in a small, optimal part of the total state
space of the environment. When a model trained on this data finds itself in a different part of
the state space, it can behave erratically.
In this thesis, we will combine Inverse Optimization with the active learning method of Dataset Aggregation (DAgger) to test if this improves model performance in dynamic settings. DAgger is an iterative process where the system is steered by the learner, creating new input data for the expert to find the best actions. This new data is then used to train a new model.
Furthermore, we propose a new algorithm, fast-DAgger, that should converge faster than the
DAgger algorithm, at the possible cost of performance in the final model.
IO models trained with the DAgger and fast-DAgger algorithms are tested and compared to
IO models trained on static datasets. This is done for two case studies: the Dynamic Vehicle
Routing Problem as proposed by the EURO meets Neurips 2022 Vehicle Routing Competition,
and the game of Tetris.
Results show the potential of combining IO with DAgger. However, DAgger is not always
better than training with a static dataset. DAgger can only be helpful when the static training
data is limited to a part of the total state space and when this data does not generalize well to the total state space. The fast-DAgger algorithm did not show a significant speed-up compared to the normal DAgger algorithm in the case studies. However, this is very dependent on the specifics of the model and the hyperparameters of the DAgger algorithm.