Long-term values in markov decision processes, (Co)algebraically

None, None; None, None; None, None

Long-term values in markov decision processes, (Co)algebraically

Conference Paper (2018)

Author(s)

Frank Feys (TU Delft - Energy and Industry)

H.H. Hansen (TU Delft - Energy and Industry)

Lawrence S. Moss (Indiana University)

Research Group

Energy and Industry

DOI related publication

https://doi.org/10.1007/978-3-030-00389-0_6

Algebra Markov decision process Coalgebra Metric space Corecursive algebra Discounted sum Fixpoint Long-term value

To reference this document use:

https://resolver.tudelft.nl/uuid:3c60f476-84bd-47f0-8bad-a048d74f37c9

More Info

expand_more

Publication Year

2018

Language

English

Research Group

Energy and Industry

Volume number

11202 LNCS

Pages (from-to)

78-99

ISBN (print)

9783030003883

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.

No files available

Metadata only record. There are no files for this record.