Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

None, None; None, None; None, None; None, None

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Journal Article (2024)

Author(s)

Ji Cheng (City University of Hong Kong)

Bo Xue (City University of Hong Kong)

Jiaxiang Yi (TU Delft - Team Marcel Sluiter)

Qingfu Zhang (City University of Hong Kong)

DOI related publication

https://doi.org/10.1609/aaai.v38i10.29030

Reinforcement Learning Online Learning & Bandits

To reference this document use:

https://resolver.tudelft.nl/uuid:4e07db99-cc07-4345-97cf-85127f4b5c68

More Info

expand_more

Publication Year

2024

Language

English

Issue number

10

Volume number

38

Pages (from-to)

11489-11497

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers’ preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners’ performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

Files

29030-Article_Text-33084-1-2-2... (pdf)

(pdf | 3.37 Mb)

- Embargo expired in 24-09-2024

License info not available