Cointegration Aware Pairs Trading with Reinforcement Learning Based Optimal Stopping

Bachelor Thesis (2026)
Author(s)

T. Pagu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F. Yu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

F.A. Oliehoek – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Yorke-Smith – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
23-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
8
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Pairs trading is a type of algorithmic trading strategy that exploits temporary diver-
gences between assets that tend to follow each other, which we describe as cointegrated. As a special case of statistical arbitrage, it has long been studied by both practitioners and academics. We hypothesize that a common failure of existing pairs trading strategies is their behavior when the cointegration relationship is not constant over long periods of time, which is often the case in practice. We show that given future knowledge of the cointegration relation, a strategy can yield dramatically better returns, up to 16% annualized during cointegration periods. This finding motivates a data-driven approach for estimating the cointegration regime. To solve this problem, we propose a GRU model that tracks the cointegration regime better than chance, though its reliability varies substantially by pair. We trained an RL model to exploit cointegration periods using synthetic data, and experimented with limiting trading to only periods of predicted cointegration. We tested this on three commonly used pairs and found it outperformed the risk-free rate, with Sharpe ratios of 0.30–0.65. Our work shows the potential of cointegration-aware approaches through an oracle analysis, proposes a way to approximate it in a realistic strategy, and identifies current limitations of the model.

Files

License info not available