Reinforcement Learning for Regime-Aware Pairs Trading

None, None

Reinforcement Learning for Regime-Aware Pairs Trading

Regime-Switching Reinforcement Learning for Portfolio Allocation in Pairs Trading

Bachelor Thesis (2026)

Author(s)

T.B. Ilieva (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F. Yu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Yorke-Smith – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

F.A. Oliehoek – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use

https://resolver.tudelft.nl/uuid:4f4f2e2e-ca16-4fd1-91a6-b17a9580ed73

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

26-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

7

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Pairs trading is a well-studied strategy in statistical arbitrage. By using asset pairs with correlated changes in their historical prices, the strategy profits from exploiting the non-permanent divergence of their price relationship, assuming that this relationship will revert to its long-term equilibrium. However, the dynamics of this relationship may vary over time, as the spread, which measures the deviation between the prices of paired assets, can exhibit different levels of volatility and mean-reverting behavior under different market conditions. In this paper, we propose a regime-aware reinforcement learning framework for portfolio optimization in pairs trading. We model the spread between assets and characterize its behavior using statistical features capturing its relative position to historical equilibrium, its volatility, and the strength of its mean-reverting behavior. These features are used within a Hidden Markov Model to infer latent market regimes, which represent distinct states of spread dynamics over time. The inferred regimes are incorporated into the state representation of a reinforcement learning agent, which learns to dynamically allocate capital across pairs. We evaluate the proposed approach against a regime-agnostic reinforcement learning benchmark and a classical z-score threshold strategy. In a controlled simulation study, the regime-aware agent achieves a mean Sharpe ratio of 1.354 versus 0.738 for the baseline on V/MA (ΔSharpe = +0.616) and 1.183 versus 0.564 on V/JKHY (ΔSharpe = +0.619), consistent across 10 training seeds. On real out-of-sample data from 2023 to 2026, the regime agent achieves Sharpe ratios of 0.567 and 0.609 on V/MA and V/JKHY respectively, outperforming the baseline in both cases.

Files

Research_paper_CSE_Bachelor_pr... (pdf)

(pdf | 2.29 Mb)

License info not available