Knowledge gradient exploration in online kernel-based LSPI

Yahyaa, S.; Manderick, B.

Knowledge gradient exploration in online kernel-based LSPI

Title

Knowledge gradient exploration in online kernel-based LSPI

Author

Yahyaa, S.
Manderick, B.

Date

2013-11-08

Abstract

We introduce online kernel-based LSPI (or least squares policy iteration) which combines feature of online LSPI and offline kernel-based LSPI. The knowledge gradient is used as exploration policy in both online LSPI and online kernel-based LSPI in order to compare their performance on 2 discrete Markov decision problems. Automatic feature selection in online kernel-based LSPI, which is a result of the approximate linear dependency based kernel sparsification, improves the performance when compared to online LSPI.

To reference this document use:

http://resolver.tudelft.nl/uuid:02f49672-936c-430f-a30e-243388aeabe4

Part of collection

Conference proceedings

Document type

conference paper

Rights

Files

PDF

paper_18.pdf

273.37 KB

Close viewer