Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

None, None; None, None; None, None

Improving Offline Value-Function Approximations for POMDPs by Reducing Discount Factors

Conference Paper (2018)

Author(s)

Yi Chun Chen (Anderson School of Management)

Mykel J. Kochenderfer (Stanford University)

Matthijs T.J. Spaan (TU Delft - Algorithmics)

Research Group

Algorithmics

DOI related publication

https://doi.org/10.1109/IROS.2018.8594418

To reference this document use:

https://resolver.tudelft.nl/uuid:c321b2be-1a81-4452-94a0-149ec6f56691

More Info

expand_more

Publication Year

2018

Language

English

Research Group

Algorithmics

Article number

8594418

Pages (from-to)

3531-3536

ISBN (electronic)

978-1-5386-8094-0

Event

2018 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2018 (2018-10-01 - 2018-10-05), Madrid, Spain

Downloads counter

91

Abstract

A common solution criterion for partially observable Markov decision processes (POMDPs) is to maximize the expected sum of exponentially discounted rewards, for which a variety of approximate methods have been proposed. Those that plan in the belief space typically provide tighter performance guarantees, but those that plan over the state space (e.g., QMDP and FIB) often require much less memory and computation. This paper presents an encouraging result that shows that reducing the discount factor while planning in the state space can actually improve performance significantly when evaluated on the original problem. This phenomenon is confirmed by both a theoretical analysis as well as a series of empirical studies on benchmark problems. As predicted by the theory and confirmed empirically, the phenomenon is most prominent when the observation model is noisy or rewards are sparse.

No files available

Metadata only record. There are no files for this record.