Comparing Model-Free Deep Reinforcement Learning Algorithms on Stock Market
M.K. Meral (TU Delft - Electrical Engineering, Mathematics and Computer Science)
G. Neustroev – Mentor (TU Delft - Algorithmics)
M.M. de Weerdt – Graduation committee member (TU Delft - Algorithmics)
M.A. Zuñiga Zamalloa – Coach (TU Delft - Embedded Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Automated asset trading is a crucial method used by financial entities such as investment firms or hedge funds. It allows them to allocate their capital in order to maximize their rate of returns. In scientific literature, there are multiple models suggested to solve this problem. However, these models either lack the complexity to understand the market or scalability for the market in general. On the other hand, deep reinforcement learning is a great framework that can solve these problem. In this study we aim to understand the performance of model-free deep reinforcement learning algorithms in terms of training speed, financial performance and generalizability by training and comparing them on a smaller representative market. Proximal Policy Approximation (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3) were used as a representatives of policy approximation and QLearning algorithms respectively. Our study have found that while proximal policy algorithms offer higher speed due to smaller training data they use at each timestep, Q-Learning algorithms offer a better general performance in terms of stability and generalizability. With respect to financial performance on training stocks, this study did not find a statistically important difference in performances.