Print Email Facebook Twitter On Non-Stationarity in Reinforced Deep Markov Models with Applications in Portfolio Optimization Title On Non-Stationarity in Reinforced Deep Markov Models with Applications in Portfolio Optimization Author Chin-A-Pauw, Laurens (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Yu, F. (mentor) Papapantoleon, A. (graduation committee) Derumigny, Alexis (graduation committee) Degree granting institution Delft University of Technology Programme Applied Mathematics | Stochastics Date 2024-04-10 Abstract In this thesis, we aim to improve the application of deep reinforcement learning in portfo- lio optimization. Reinforcement learning has in recent years been applied to a wide range of problems, from games to control systems in the physical world and also to finance. While reinforcement learning has shown success in simulated environments (e.g. matching or exceeding human performance in games), its adoption in practical applications (non- simulated environments) has lagged. Dulac-Arnold et al. [2019] suggest this is caused by a discrepancy in the experimental set-up in research and the conditions in practice. Specifically, they present a list of challenges that make the application of reinforcement learning in real-world settings more difficult. One of these challenges is non-stationary environments, which is common in financial environments. It is a challenge since, given an observed state, the optimal action may not always be the same as it may change over time due to non-stationarity. Therefore, more specifically, the goal of this thesis is to overcome the challenge of non-stationarity in the application of reinforcement learning to portfolio optimization. In this thesis, we use reinforced deep Markov models (RDMM) introduced by Ferreira [2020] (applied to an optimal execution problem and later used by Cartea et al. [2021] for statistical arbitrage on simulated price movements of an FX triplet) for its data efficiency and ability to handle complex environments. RDMM involve a partially observ- able Markov decision process (POMDP) which is also the setting used by Xie et al. [2021] to model non-stationarity in reinforcement learning. We extend RDMM to incorporate non-stationarity, using the framework suggested by Xie et al. [2021], and apply it to port- folio optimization. Our implementation is sample efficient which allows for quick learning, by doing this we attempt to improve on another challenge of reinforcement learning — i.e. sample-inefficiency [Dulac-Arnold et al., 2019]. Moreover, our implementation can handle continuous state and action spaces.We compare the performance of our algorithms to classical portfolio optimization tech- niques such as Mean-Variance (MV) and Equal Risk Contribution (ERC), and to popular reinforcement learning techniques such as Deep Deterministic Policy Gradient (DDPG) and Soft Actor-Critic (SAC). We observe our implementation has higher sample-efficiency compared DDPG and SAC, and higher cumulative returns on the test set compared to MV, ERC, DDPG, and SAC. Subject Reinforced Deep Markov ModelsModel-Based Reinforcement LearningNon-StationarityPortfolio OptimizationPartially Observable Markov De- cision Processes To reference this document use: http://resolver.tudelft.nl/uuid:4423b7e9-caff-46b3-9185-339d65a5b8c1 Part of collection Student theses Document type master thesis Rights © 2024 Laurens Chin-A-Pauw Files PDF MSc_Thesis_L_ChinAPauw.pdf 1.96 MB Close viewer /islandora/object/uuid:4423b7e9-caff-46b3-9185-339d65a5b8c1/datastream/OBJ/view