The multi-armed bandit problem is a sequential learning scenario in which a learning algorithm seeks to obtain rewards by selecting an arm, or action, in each round, given limited initial knowledge. Contextual bandits present an additional context every round that informs the ban
...
The multi-armed bandit problem is a sequential learning scenario in which a learning algorithm seeks to obtain rewards by selecting an arm, or action, in each round, given limited initial knowledge. Contextual bandits present an additional context every round that informs the bandit algorithm and guides decision-making. While successfully applied in practice, research continues to explore efficient bandit algorithms for high-dimensional bandits with nonparametric, sparsely varying reward functions. One such algorithm is the two-phase SI-BO algorithm, which incorporates an initial subspace learning phase to identify the effective context subspace on which the function varies, and a subsequent Bayesian optimization phase that applies the Gaussian Process-based GP-UCB algorithm to the learned subspace. While the SI-BO offers a theoretical regret performance with weak sub-exponential dependence on the ambient dimension, it is hindered by a high computational cost stemming from the Gaussian Process regression. Building on the algorithm framework introduced in SI-BO, this paper aims to investigate the empirical regret performance of Gaussian Process-based learning algorithms that incorporate subspace learning. To that end, we introduce a novel algorithm, SI-BKB, which combines the subspace learning in SI-BO with the BKB sketching algorithm, reducing computational complexity while maintaining theoretical guarantees. Through synthetic data generation, this paper presents a systematic empirical study on linear and nonlinear bandit environments with varying levels of sparsity. The results demonstrate that the SI-BKB algorithm has comparable regret performance to the SI-BO. Additionally, the regret performance indicates that misalignment of the learned subspace results in suboptimal regret performance during the optimization phase. Moreover, we demonstrate that high sparsity, through subspace misalignment, can improve the regret performance. Repository is available at \url{https://github.com/Cheese-1/SparseSequentialLearning}.