First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

Conference Paper (2023)
Author(s)

Julia Olkhovskaya (TU Delft - Sequential Decision Making)

Jack Mayo (Korteweg-de Vries Institute for Mathematics)

Tim van Erven (Korteweg-de Vries Institute for Mathematics)

Gergely Neu (Pompeu Fabra University)

Chen Yu Wei (Massachusetts Institute of Technology)

Research Group
Sequential Decision Making
More Info
expand_more
Publication Year
2023
Language
English
Research Group
Sequential Decision Making
Volume number
36
ISBN (print)
9781713899921
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

We consider the adversarial linear contextual bandit setting, which allows for the loss functions associated with each of K arms to change over time without restriction. Assuming the d-dimensional contexts are drawn from a fixed known distribution, the worst-case expected regret over the course of T rounds is known to scale as Õ(√KdT). Under the additional assumption that the density of the contexts is log-concave, we obtain a second-order bound of order Õ(K√dVT) in terms of the cumulative second moment of the learner's losses VT, and a closely related first-order bound of order Õ(KpdLT) in terms of the cumulative loss of the best policy LT. Since VT or LT may be significantly smaller than T, these improve over the worst-case regret whenever the environment is relatively benign. Our results are obtained using a truncated version of the continuous exponential weights algorithm over the probability simplex, which we analyse by exploiting a novel connection to the linear bandit setting without contexts.

Files

License info not available