How kernelized Multi-Armed Bandit algorithms compare to other algorithms with fixed kernelized reward and noisy observations

Bachelor Thesis (2024)
Author(s)

M.K. Herrebout (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julia Olkhovskaya – Mentor (TU Delft - Sequential Decision Making)

R Venkatesha Prasad – Graduation committee member (TU Delft - Networked Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2024
Language
English
Graduation Date
25-06-2024
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project', 'Exploring Bandit Algorithms in User-Interactive Systems']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The aim of this paper is to challenge and compare several Multi-Armed Bandit algorithms in an en- vironment with fixed kernelized reward and noisy observations. Bandit algorithms are a class of decision-making problems with the goal of opti- mizing the trade-off between exploration and ex- ploitation of all choices. Each decision yields some reward, and the goal is to minimize the regret that follows from a combination of decisions, that is to say to minimize the difference between the set of decisions made, and the set of optimal decisions. In particular, these algorithms deal with the trade- off between choosing the best-known option and exploring new, possibly better options. These al- gorithms are widely used in reinforcement learn- ing, optimization and economics, where decisions need to be made without all the information and with some uncertainty. Each environment is dif- ferent however, and some algorithms are better in some environments than others.

Files

RP_Paper_Marijn.pdf
(pdf | 3.09 Mb)
License info not available