Multiple Pairs Trading for Portfolio Optimization with Reinforcement Learning

None, None

Multiple Pairs Trading for Portfolio Optimization with Reinforcement Learning

Bachelor Thesis (2026)

Author(s)

R.K. Georgiev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

F. Yu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

F.A. Oliehoek – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

N. Yorke-Smith – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Reinforcement learning Portfolio optimization PPO Pairs trading Cointegration Multiple pairs

To reference this document use

https://resolver.tudelft.nl/uuid:48f3ae7d-06a9-4ec5-a2c0-004e8894ed23

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

24-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

11

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Pairs trading has grown increasingly popular over the past several decades, and its application has extended into the domain of portfolio optimization. Reinforcement learning (RL) strategies, particularly Proximal Policy Optimization (PPO), have been used to address this problem. However, while substantial research exists for the single-pair case, a systematic investigation of RL models for portfolio optimization across multiple pairs simultaneously has been lacking. To address this gap, we develop and compare two PPO models that trade on several cointegrated pairs identified within the energy sector of the S&P 500. The two models differ in their information set: one is given explicit knowledge of the asset pairs it trades, while the other operates without this information, learning to allocate capital from price and portfolio data alone. We find that the pair-aware model achieves an annual return of 20.1% and a Sharpe ratio of 0.877, and maintains consistent performance across varying numbers of traded pairs, though no clear relationship emerges between the number of pairs traded and performance. These results suggest that the multi-pair approach to portfolio optimization is promising and highlight the need for further investigation.

Files

Multiple_Pairs_Trading_for_Por... (pdf)

(pdf | 0.725 Mb)

License info not available