Cointegration Aware Pairs Trading with Reinforcement Learning Based Optimal Stopping

None, None

Cointegration Aware Pairs Trading with Reinforcement Learning Based Optimal Stopping

Bachelor Thesis (2026)

Author(s)

T. Pagu (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F. Yu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

F.A. Oliehoek – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Yorke-Smith – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement Learning (RL) DQN Algorithmic trading Pairs trading

To reference this document use

https://resolver.tudelft.nl/uuid:7b0827ab-bdfe-4498-8592-3a6e2ff49199

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

23-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

8

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Pairs trading is a type of algorithmic trading strategy that exploits temporary diver-
gences between assets that tend to follow each other, which we describe as cointegrated. As a special case of statistical arbitrage, it has long been studied by both practitioners and academics. We hypothesize that a common failure of existing pairs trading strategies is their behavior when the cointegration relationship is not constant over long periods of time, which is often the case in practice. We show that given future knowledge of the cointegration relation, a strategy can yield dramatically better returns, up to 16% annualized during cointegration periods. This finding motivates a data-driven approach for estimating the cointegration regime. To solve this problem, we propose a GRU model that tracks the cointegration regime better than chance, though its reliability varies substantially by pair. We trained an RL model to exploit cointegration periods using synthetic data, and experimented with limiting trading to only periods of predicted cointegration. We tested this on three commonly used pairs and found it outperformed the risk-free rate, with Sharpe ratios of 0.30–0.65. Our work shows the potential of cointegration-aware approaches through an oracle analysis, proposes a way to approximate it in a realistic strategy, and identifies current limitations of the model.

Files

Optimal_Stopping_for_Pairs_Tra... (pdf)

(pdf | 2.46 Mb)

License info not available