Comparing Model-Free Deep Reinforcement Learning Algorithms on Stock Market

None, None

Comparing Model-Free Deep Reinforcement Learning Algorithms on Stock Market

Bachelor Thesis (2021)

Author(s)

M.K. Meral (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

G. Neustroev – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.M. de Weerdt – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

M.A. Zuñiga Zamalloa – Coach (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Deep Reinforcement Learning Stock Market Policy Optimization Q-Learning

To reference this document use

https://resolver.tudelft.nl/uuid:852f21d1-1229-4c94-ad08-ae8a0f937d2b

More Info

expand_more

Publication Year

2021

Language

English

Graduation Date

30-07-2021

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

161

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Automated asset trading is a crucial method used by financial entities such as investment firms or hedge funds. It allows them to allocate their capital in order to maximize their rate of returns. In scientific literature, there are multiple models suggested to solve this problem. However, these models either lack the complexity to understand the market or scalability for the market in general. On the other hand, deep reinforcement learning is a great framework that can solve these problem. In this study we aim to understand the performance of model-free deep reinforcement learning algorithms in terms of training speed, financial performance and generalizability by training and comparing them on a smaller representative market. Proximal Policy Approximation (PPO) and Twin Delayed Deep Deterministic Policy Gradient (TD3) were used as a representatives of policy approximation and QLearning algorithms respectively. Our study have found that while proximal policy algorithms offer higher speed due to smaller training data they use at each timestep, Q-Learning algorithms offer a better general performance in terms of stability and generalizability. With respect to financial performance on training stocks, this study did not find a statistically important difference in performances.

Files

Finalimsi.pdf

(pdf | 0.334 Mb)

License info not available