Effects of exploration-exploitation strategies in dynamic Forex markets
The use of Reinforcement Learning in Algorithmic Trading
M.R. Serban (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M. A.S. Kolarijani – Mentor (TU Delft - Team Amin Sharifi Kolarijani)
A. Papapantoleon – Mentor (TU Delft - Applied Probability)
Neil Yorke-Smith – Mentor (TU Delft - Algorithmics)
Julia Olkhovskaya – Graduation committee member (TU Delft - Sequential Decision Making)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This paper examines how different exploration strategies affect the learning behavior and trading performance of reinforcement learning (RL) agents in a custom foreign exchange (forex) environment. By holding all other components constant—including model architecture, features, and reward function—the study isolates the role of exploration in deep Q-learning. Three strategies were compared: Epsilon-Greedy, Boltzmann, and Max-Boltzmann. The hybrid Max-Boltzmann approach delivered the most stable and profitable outcomes, suggesting that weighted, value-aware exploration can be beneficial in highrisk domains. The results also highlight the impact of non-Markovian structure in financial environments and limitations of equity-based rewards. Beyond empirical results, this work contributes a modular, reproducible RL framework for trading, and opens new questions about the suitability of exploration techniques in environments with asymmetric risk and irreversible actions.