Multiple Pairs Trading for Portfolio Optimization with Reinforcement Learning
R.K. Georgiev (TU Delft - Electrical Engineering, Mathematics and Computer Science)
F. Yu – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
F.A. Oliehoek – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)
N. Yorke-Smith – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Pairs trading has grown increasingly popular over the past several decades, and its application has extended into the domain of portfolio optimization. Reinforcement learning (RL) strategies, particularly Proximal Policy Optimization (PPO), have been used to address this problem. However, while substantial research exists for the single-pair case, a systematic investigation of RL models for portfolio optimization across multiple pairs simultaneously has been lacking. To address this gap, we develop and compare two PPO models that trade on several cointegrated pairs identified within the energy sector of the S&P 500. The two models differ in their information set: one is given explicit knowledge of the asset pairs it trades, while the other operates without this information, learning to allocate capital from price and portfolio data alone. We find that the pair-aware model achieves an annual return of 20.1% and a Sharpe ratio of 0.877, and maintains consistent performance across varying numbers of traded pairs, though no clear relationship emerges between the number of pairs traded and performance. These results suggest that the multi-pair approach to portfolio optimization is promising and highlight the need for further investigation.