On Non-Stationarity in Reinforced Deep Markov Models with Applications in Portfolio Optimization

Chin-A-Pauw, Laurens

On Non-Stationarity in Reinforced Deep Markov Models with Applications in Portfolio Optimization

Title

On Non-Stationarity in Reinforced Deep Markov Models with Applications in Portfolio Optimization

Author

Chin-A-Pauw, Laurens (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Yu, F. (mentor)
Papapantoleon, A. (graduation committee)
Derumigny, Alexis (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Applied Mathematics | Stochastics

Date

2024-04-10

Abstract

In this thesis, we aim to improve the application of deep reinforcement learning in portfo- lio optimization. Reinforcement learning has in recent years been applied to a wide range of problems, from games to control systems in the physical world and also to finance. While reinforcement learning has shown success in simulated environments (e.g. matching or exceeding human performance in games), its adoption in practical applications (non- simulated environments) has lagged. Dulac-Arnold et al. [2019] suggest this is caused by a discrepancy in the experimental set-up in research and the conditions in practice. Specifically, they present a list of challenges that make the application of reinforcement learning in real-world settings more difficult. One of these challenges is non-stationary environments, which is common in financial environments. It is a challenge since, given an observed state, the optimal action may not always be the same as it may change over time due to non-stationarity. Therefore, more specifically, the goal of this thesis is to overcome the challenge of non-stationarity in the application of reinforcement learning to portfolio optimization. In this thesis, we use reinforced deep Markov models (RDMM) introduced by Ferreira [2020] (applied to an optimal execution problem and later used by Cartea et al. [2021] for statistical arbitrage on simulated price movements of an FX triplet) for its data efficiency and ability to handle complex environments. RDMM involve a partially observ- able Markov decision process (POMDP) which is also the setting used by Xie et al. [2021] to model non-stationarity in reinforcement learning. We extend RDMM to incorporate non-stationarity, using the framework suggested by Xie et al. [2021], and apply it to port- folio optimization. Our implementation is sample efficient which allows for quick learning, by doing this we attempt to improve on another challenge of reinforcement learning — i.e. sample-inefficiency [Dulac-Arnold et al., 2019]. Moreover, our implementation can handle continuous state and action spaces.
We compare the performance of our algorithms to classical portfolio optimization tech- niques such as Mean-Variance (MV) and Equal Risk Contribution (ERC), and to popular reinforcement learning techniques such as Deep Deterministic Policy Gradient (DDPG) and Soft Actor-Critic (SAC). We observe our implementation has higher sample-efficiency compared DDPG and SAC, and higher cumulative returns on the test set compared to MV, ERC, DDPG, and SAC.

Subject

Reinforced Deep Markov Models
Model-Based Reinforcement Learning
Non-Stationarity
Portfolio Optimization
Partially Observable Markov De- cision Processes

To reference this document use:

http://resolver.tudelft.nl/uuid:4423b7e9-caff-46b3-9185-339d65a5b8c1

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

MSc_Thesis_L_ChinAPauw.pdf

1.96 MB

Close viewer