Circular Image

I.M. Olkhovskaia

10 records found

Sparse Sequential Learning

Exploring Stochastic Contextual Linear Bandit and Feature Selection Combinations for Fixed Reduced Dimensions

Stochastic contextual linear bandits are widely used for sequential decision‐making across many domains. However, in high‐dimensional sparse settings, most candidate features are irrelevant to predicting outcomes, and collecting such data is costly. This study examines various SC ...
The multi-armed bandit problem is a sequential learning scenario in which a learning algorithm seeks to obtain rewards by selecting an arm, or action, in each round, given limited initial knowledge. Contextual bandits present an additional context every round that informs the ban ...

Adaptive Feature Selection For Sparse Linear Bandits

Experimental study on strategies for Online Feature Selection in High-Dimensional Bandit Settings

The Multi-armed Bandit (MAB) is a classic problem in reinforcement learning that exemplifies the exploration-exploitation dilemma - deciding when to gather more information and when to act on current knowledge. In its sparse variant, the feature vectors often contain many irrelev ...

Exploring Bandit Algorithms in Sparse Environments

Does increasing the level of sparsity enhance the advantage of sparsity-adapted Multi-Armed Bandit algorithms?

In sequential decision-making, Multi-armed Bandit (MAB) models the dilemma of exploration versus exploitation. The problem is commonly situated in an unknown environment where a player iteratively selects one action from a set of predetermined choices. The player's choices can be ...

Comparing bandit algorithms in static and changing environments

An experimental study on the regret performance of bandit algorithms in various environments

The aim of this paper is to show experimental data on the regret-based performance of various solver algorithms within a class of decision problems called Multi-Armed Bandits. This can help to more efficiently choose the algorithm most suited for an application and to reduce the ...

Exploring Bandit Algorithms in User-Interactive Systems

Influence of Delay on Contextual Multi-Armed Bandits

Delay is a frequently encountered phenomenon in Multi-armed bandit problems that affects the accuracy of choosing the optimal arm. One example of this phenomenon is online shopping, where there is a delay between a user being recommended a product and placing the order. This stud ...
The Multi-Mode Resource Constraint Scheduling Problem is an NP-hard optimization problem. It arises in various industries such as construction engineering, transportation, and software development. This paper explores the integration of an adaptation of the Longest Processing Tim ...
This thesis investigates the performance of various bandit algorithms in non-stationary contextual environments, where reward functions change unpredictably over time. Traditional bandit algorithms, designed for stationary settings, often fail in dynamic real-world scenarios. Thi ...
The aim of this paper is to challenge and compare several Multi-Armed Bandit algorithms in an en- vironment with fixed kernelized reward and noisy observations. Bandit algorithms are a class of decision-making problems with the goal of opti- mizing the trade-off between explorati ...