Quantum gradient estimation and its application to quantum reinforcement learning

Master thesis (2018)

Authors

A.J. Cornelissen Electrical Engineering, Mathematics and Computer Science

Contributors

M.P.T. Caspers (mentor)

J.M.A.M. van Neerven (mentor)

R.M. de Wolf (mentor)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

More Info

expand_more

To reference this document use:

http://resolver.tudelft.nl/uuid:26fe945f-f02e-4ef7-bdcb-0a2369eb867e

Published Date

04-09-2018

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

In 2005, Jordan showed how to estimate the gradient of a real-valued function with a high-dimensional domain on a quantum computer. Subsequently, in 2017, it was shown by Gilyén et al. how to do this with a different input model. They also proved optimality of their algorithm for \ell^\infty -approximations of functions satisfying some smoothness conditions.

In this text, we expand the ideas of Gilyén et al., and extend their algorithm such that functions with fewer regularity constraints can be used as input. Moreover, we show that their algorithm is essentially optimal in the query complexity to the phase oracle even for classes of functions that have more stringent smoothness conditions. Finally, we also prove that their algorithm is optimal for approximating gradients with respect
to general \ell^p -norms, where p \in [1,\infty].

Furthermore, we investigate how Gilyén et al.’s algorithm can be used to do reinforcement learning on a quantum computer. We elaborate on Montanaro’s ideas for quantum Monte-Carlo simulation, and show how they can be used to implement quantum value estimation of Markov reward processes. We also show essential optimality of this algorithm in the query complexity of all its oracles. Next, we show how we can construct a quantum policy evaluation algorithm, and how we can use these algorithms as subroutines in Gilyén et al.’s quantum gradient estimation algorithm to perform quantum policy optimization.

The most important open questions remain whether it is possible to improve the query complexity of the extension of Gilyén et al.’s algorithm, when function classes containing functions of Gevrey-type 1 are used as input, as at the moment for this specific parameter setting the algorithm is not better than a very simple classical gradient estimation procedure. Improvement of this result would automatically improve the quantum policy optimization routine as well.

Files

Quantum_gradient_estimation_an... (.pdf)

(.pdf | 1.15 Mb)