I.M. Olkhovskaia | TU Delft Repository

Sparse Sequential Learning

Exploring Stochastic Contextual Linear Bandit and Feature Selection Combinations for Fixed Reduced Dimensions

Bachelor thesis (2025) - V.K. Pasumarthi (author) , Julia Olkhovskaya (mentor) , L. Cavalcante Siebert (graduation committee member)

Stochastic contextual linear bandits are widely used for sequential decision‐making across many domains. However, in high‐dimensional sparse settings, most candidate features are irrelevant to predicting outcomes, and collecting such data is costly. This study examines various SC ...

Adaptive Feature Selection For Sparse Linear Bandits

Experimental study on strategies for Online Feature Selection in High-Dimensional Bandit Settings

Bachelor thesis (2025) - M.S. Damyanov (author) , Julia Olkhovskaya (mentor)

The Multi-armed Bandit (MAB) is a classic problem in reinforcement learning that exemplifies the exploration-exploitation dilemma - deciding when to gather more information and when to act on current knowledge. In its sparse variant, the feature vectors often contain many irrelev ...

Subspace Learning with Gaussian Processes for Sparse Contextual Bandits

Bachelor thesis (2025) - Y. Chizi (author) , Julia Olkhovskaya (mentor) , L. Cavalcante Siebert (graduation committee member)

The multi-armed bandit problem is a sequential learning scenario in which a learning algorithm seeks to obtain rewards by selecting an arm, or action, in each round, given limited initial knowledge. Contextual bandits present an additional context every round that informs the ban ...

The multi-armed bandit problem is a sequential learning scenario in which a learning algorithm seeks to obtain rewards by selecting an arm, or action, in each round, given limited initial knowledge. Contextual bandits present an additional context every round that informs the bandit algorithm and guides decision-making. While successfully applied in practice, research continues to explore efficient bandit algorithms for high-dimensional bandits with nonparametric, sparsely varying reward functions. One such algorithm is the two-phase SI-BO algorithm, which incorporates an initial subspace learning phase to identify the effective context subspace on which the function varies, and a subsequent Bayesian optimization phase that applies the Gaussian Process-based GP-UCB algorithm to the learned subspace. While the SI-BO offers a theoretical regret performance with weak sub-exponential dependence on the ambient dimension, it is hindered by a high computational cost stemming from the Gaussian Process regression. Building on the algorithm framework introduced in SI-BO, this paper aims to investigate the empirical regret performance of Gaussian Process-based learning algorithms that incorporate subspace learning. To that end, we introduce a novel algorithm, SI-BKB, which combines the subspace learning in SI-BO with the BKB sketching algorithm, reducing computational complexity while maintaining theoretical guarantees. Through synthetic data generation, this paper presents a systematic empirical study on linear and nonlinear bandit environments with varying levels of sparsity. The results demonstrate that the SI-BKB algorithm has comparable regret performance to the SI-BO. Additionally, the regret performance indicates that misalignment of the learned subspace results in suboptimal regret performance during the optimization phase. Moreover, we demonstrate that high sparsity, through subspace misalignment, can improve the regret performance. Repository is available at \url{https://github.com/Cheese-1/SparseSequentialLearning}.

The use of Reinforcement Learning in Algorithmic Trading

The Impact of Function Approximation Methods on Model Performance

Bachelor thesis (2025) - R.H. Mertens (author) , N. Yorke-Smith (mentor) , M.A. Sharifi Kolarijani (mentor) , A. Papapantoleon (mentor) , Julia Olkhovskaya (mentor)

Comparing bandit algorithms in static and changing environments

An experimental study on the regret performance of bandit algorithms in various environments

Bachelor thesis (2024) - C.M. Boon (author) , Julia Olkhovskaya (mentor) , Ranga Rao Venkatesha Prasad (graduation committee member)

The aim of this paper is to show experimental data on the regret-based performance of various solver algorithms within a class of decision problems called Multi-Armed Bandits. This can help to more efficiently choose the algorithm most suited for an application and to reduce the ...

Exploring Bandit Algorithms in Sparse Environments

Does increasing the level of sparsity enhance the advantage of sparsity-adapted Multi-Armed Bandit algorithms?

Bachelor thesis (2024) - R.R. Owczarski (author) , Julia Olkhovskaya (mentor) , Ranga Rao Venkatesha Prasad (graduation committee member)

In sequential decision-making, Multi-armed Bandit (MAB) models the dilemma of exploration versus exploitation. The problem is commonly situated in an unknown environment where a player iteratively selects one action from a set of predetermined choices. The player's choices can be ...

How kernelized Multi-Armed Bandit algorithms compare to other algorithms with fixed kernelized reward and noisy observations

Bachelor thesis (2024) - M.K. Herrebout (author) , Julia Olkhovskaya (mentor) , Ranga Rao Venkatesha Prasad (graduation committee member)

The aim of this paper is to challenge and compare several Multi-Armed Bandit algorithms in an en- vironment with fixed kernelized reward and noisy observations. Bandit algorithms are a class of decision-making problems with the goal of opti- mizing the trade-off between explorati ...

Exploring Bandit Algorithms in User-Interactive Systems

Influence of Delay on Contextual Multi-Armed Bandits

Bachelor thesis (2024) - D.C. Arsene (author) , Julia Olkhovskaya (mentor) , Ranga Rao Venkatesha Prasad (graduation committee member)

Delay is a frequently encountered phenomenon in Multi-armed bandit problems that affects the accuracy of choosing the optimal arm. One example of this phenomenon is online shopping, where there is a delay between a user being recommended a product and placing the order. This stud ...

Evaluating Performance of Bandit Algorithms in Non-stationary Contextual Environments

Bachelor thesis (2024) - W. HU (author) , Julia Olkhovskaya (mentor) , Ranga Rao Venkatesha Prasad (graduation committee member)

This thesis investigates the performance of various bandit algorithms in non-stationary contextual environments, where reward functions change unpredictably over time. Traditional bandit algorithms, designed for stationary settings, often fail in dynamic real-world scenarios. Thi ...

Enhancing VSIDS with domain-specific information for the MRCPSP

Bachelor thesis (2024) - J. P. Berger (author) , E. Demirović (mentor) , Maarten Flippo (graduation committee member) , Imko Marijnissen (graduation committee member) , Julia Olkhovskaya (coach)

The Multi-Mode Resource Constraint Scheduling Problem is an NP-hard optimization problem. It arises in various industries such as construction engineering, transportation, and software development. This paper explores the integration of an adaptation of the Longest Processing Tim ...