How kernelized Multi-Armed Bandit algorithms compare to other algorithms with fixed kernelized reward and noisy observations

None, None

How kernelized Multi-Armed Bandit algorithms compare to other algorithms with fixed kernelized reward and noisy observations

Bachelor Thesis (2024)

Author(s)

M.K. Herrebout (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Julia Olkhovskaya – Mentor (TU Delft - Sequential Decision Making)

R Venkatesha Prasad – Graduation committee member (TU Delft - Networked Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Bandits Multi-Armed Bandits Kernels

To reference this document use:

https://resolver.tudelft.nl/uuid:7c48ee4b-cb85-4679-a2f3-9a92b21ef076

More Info

expand_more

Publication Year

2024

Language

English

Graduation Date

25-06-2024

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project', 'Exploring Bandit Algorithms in User-Interactive Systems']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The aim of this paper is to challenge and compare several Multi-Armed Bandit algorithms in an en- vironment with fixed kernelized reward and noisy observations. Bandit algorithms are a class of decision-making problems with the goal of opti- mizing the trade-off between exploration and ex- ploitation of all choices. Each decision yields some reward, and the goal is to minimize the regret that follows from a combination of decisions, that is to say to minimize the difference between the set of decisions made, and the set of optimal decisions. In particular, these algorithms deal with the trade- off between choosing the best-known option and exploring new, possibly better options. These al- gorithms are widely used in reinforcement learn- ing, optimization and economics, where decisions need to be made without all the information and with some uncertainty. Each environment is dif- ferent however, and some algorithms are better in some environments than others.

Files

RP_Paper_Marijn.pdf

(pdf | 3.09 Mb)

License info not available