Exploring Bandit Algorithms in Sparse Environments

Does increasing the level of sparsity enhance the advantage of sparsity-adapted Multi-Armed Bandit algorithms?

More Info
expand_more

Abstract

In sequential decision-making, Multi-armed Bandit (MAB) models the dilemma of exploration versus exploitation. The problem is commonly situated in an unknown environment where a player iteratively selects one action from a set of predetermined choices. The player's choices can be evaluated by comparing observed rewards after each decision to the highest one obtainable from the set of possible actions. The goal is to minimize the measure of regret of not choosing the optimal action every time. Intuitively, one might think of exploring all possible options first and then selecting the one for which the highest rewards were observed. However, it is difficult to model this behaviour when the specifications of the environment setting are not known or when it changes in time. Some classifications of environments exist and can be used to decide which approach can be used to model the MAB problem. The literature provides various algorithms used to model different MAB problems but lacks comparisons between proposed algorithms across the domains they aim to solve. To fill this gap, we want to focus on several Multi-armed Bandit algorithms (policies) and compare their effectiveness and optimality in different environments. This research focuses on a class of environments with a sparse reward function, where the reward depends on the action-specific contextual information and some sparse vector that dictates which features are considered. Four MAB algorithms were selected, each developed to solve the problem in a different environment. The comparison was made in a series of experiments to explore how sparsity affects their performance. The experiments conclude that the advantage of using sparsity-adopted algorithms depends on both sparsity and the environment setting, but increasing the sparsity helps achieve better results than traditional methods.

Files