Subspace Learning with Gaussian Processes for Sparse Contextual Bandits

Bachelor Thesis (2025)
Author(s)

Y. Chizi (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julia Olkhovskaya – Mentor (TU Delft - Sequential Decision Making)

L. Cavalcante Siebert – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
26-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The multi-armed bandit problem is a sequential learning scenario in which a learning algorithm seeks to obtain rewards by selecting an arm, or action, in each round, given limited initial knowledge. Contextual bandits present an additional context every round that informs the bandit algorithm and guides decision-making. While successfully applied in practice, research continues to explore efficient bandit algorithms for high-dimensional bandits with nonparametric, sparsely varying reward functions. One such algorithm is the two-phase SI-BO algorithm, which incorporates an initial subspace learning phase to identify the effective context subspace on which the function varies, and a subsequent Bayesian optimization phase that applies the Gaussian Process-based GP-UCB algorithm to the learned subspace. While the SI-BO offers a theoretical regret performance with weak sub-exponential dependence on the ambient dimension, it is hindered by a high computational cost stemming from the Gaussian Process regression. Building on the algorithm framework introduced in SI-BO, this paper aims to investigate the empirical regret performance of Gaussian Process-based learning algorithms that incorporate subspace learning. To that end, we introduce a novel algorithm, SI-BKB, which combines the subspace learning in SI-BO with the BKB sketching algorithm, reducing computational complexity while maintaining theoretical guarantees. Through synthetic data generation, this paper presents a systematic empirical study on linear and nonlinear bandit environments with varying levels of sparsity. The results demonstrate that the SI-BKB algorithm has comparable regret performance to the SI-BO. Additionally, the regret performance indicates that misalignment of the learned subspace results in suboptimal regret performance during the optimization phase. Moreover, we demonstrate that high sparsity, through subspace misalignment, can improve the regret performance. Repository is available at \url{https://github.com/Cheese-1/SparseSequentialLearning}.

Files

RP_Final_Paper-7.pdf
(pdf | 2.13 Mb)
License info not available