Exploring Bandit Algorithms in Sparse Environments

Does increasing the level of sparsity enhance the advantage of sparsity-adapted Multi-Armed Bandit algorithms?

Bachelor thesis (2024)

Authors

R.R. Owczarski Electrical Engineering, Mathematics and Computer Science

Contributors

Julia Olkhovskaya Sequential Decision Making - (mentor)

Ranga Rao Venkatesha Prasad Networked Systems - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:16e1085c-3d5b-4253-b601-5718553cfb39

Published Date

25-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In sequential decision-making, Multi-armed Bandit (MAB) models the dilemma of exploration versus exploitation. The problem is commonly situated in an unknown environment where a player iteratively selects one action from a set of predetermined choices. The player's choices can be evaluated by comparing observed rewards after each decision to the highest one obtainable from the set of possible actions. The goal is to minimize the measure of regret of not choosing the optimal action every time. Intuitively, one might think of exploring all possible options first and then selecting the one for which the highest rewards were observed. However, it is difficult to model this behaviour when the specifications of the environment setting are not known or when it changes in time. Some classifications of environments exist and can be used to decide which approach can be used to model the MAB problem. The literature provides various algorithms used to model different MAB problems but lacks comparisons between proposed algorithms across the domains they aim to solve. To fill this gap, we want to focus on several Multi-armed Bandit algorithms (policies) and compare their effectiveness and optimality in different environments. This research focuses on a class of environments with a sparse reward function, where the reward depends on the action-specific contextual information and some sparse vector that dictates which features are considered. Four MAB algorithms were selected, each developed to solve the problem in a different environment. The comparison was made in a series of experiments to explore how sparsity affects their performance. The experiments conclude that the advantage of using sparsity-adopted algorithms depends on both sparsity and the environment setting, but increasing the sparsity helps achieve better results than traditional methods.

Files

ResearchPaper.pdf

(.pdf | 4.87 Mb)