Q-value reuse between state abstractions for traffic light control

None, None

Q-value reuse between state abstractions for traffic light control

Bachelor Thesis (2020)

Author(s)

Emanuel Kuhn (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jiinke He – Mentor (TU Delft - Interactive Intelligence)

Rolf Starre – Mentor (TU Delft - Interactive Intelligence)

Frans Oliehoek – Graduation committee member (TU Delft - Interactive Intelligence)

Faculty

Electrical Engineering, Mathematics and Computer Science

Traffic abstractions DQN Q-value Reuse

To reference this document use:

https://resolver.tudelft.nl/uuid:aea7f1d8-cd87-4d12-ba43-9bf8ca7c4479

More Info

expand_more

Publication Year

2020

Language

English

Graduation Date

22-06-2020

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Previous research has in reinforcement learning for traffic control has used various state abstractions. Some use feature vectors while others use matrices of car positions. This paper first compares a simple feature vector consisting of only queue sizes per incoming lane to a matrix of car positions. Then it investigates if knowledge can be transferred from a simple agent using the feature vector abstraction to a more complex agent that uses the position matrix abstraction.We find that training cannot be sped up by first training an agent with the feature vector abstraction and then reusing this Q-function to train an agent with the position matrix abstraction. The simple agent does not take considerably fewer samples to converge, and the total time needed to first train the simple agent and then transfer exceeds the time needed to train the complex agent from scratch.

Files

Q_value_reuse_between_state_ab... (pdf)

(pdf | 1.6 Mb)

License info not available