This paper examines how different exploration strategies affect the learning behavior and trading performance of reinforcement learning (RL) agents in a custom foreign exchange (forex) environment. By holding all other components constant—including model architecture, features, a
...
This paper examines how different exploration strategies affect the learning behavior and trading performance of reinforcement learning (RL) agents in a custom foreign exchange (forex) environment. By holding all other components constant—including model architecture, features, and reward function—the study isolates the role of exploration in deep Q-learning. Three strategies were compared: Epsilon-Greedy, Boltzmann, and Max-Boltzmann. The hybrid Max-Boltzmann approach delivered the most stable and profitable outcomes, suggesting that weighted, value-aware exploration can be beneficial in highrisk domains. The results also highlight the impact of non-Markovian structure in financial environments and limitations of equity-based rewards. Beyond empirical results, this work contributes a modular, reproducible RL framework for trading, and opens new questions about the suitability of exploration techniques in environments with asymmetric risk and irreversible actions.