Long-term values in markov decision processes, (Co)algebraically

Conference Paper (2018)
Author(s)

Frank M.V. Feys (TU Delft - Technology, Policy and Management)

Helle Hvid Hansen (TU Delft - Technology, Policy and Management)

Lawrence S. Moss (Indiana University)

Research Group
Energy and Industry
DOI related publication
https://doi.org/10.1007/978-3-030-00389-0_6 Final published version
More Info
expand_more
Publication Year
2018
Language
English
Research Group
Energy and Industry
Volume number
11202 LNCS
Pages (from-to)
78-99
Publisher
Springer
ISBN (print)
9783030003883
Event
14th International Workshop on Coalgebraic Methods in Computer Science, CMCS 2018 Colocated with ETAPS 2018 (2018-04-14 - 2018-04-15), Thessaloniki, Greece
Downloads counter
115

Abstract

This paper studies Markov decision processes (MDPs) from the categorical perspective of coalgebra and algebra. Probabilistic systems, similar to MDPs but without rewards, have been extensively studied, also coalgebraically, from the perspective of program semantics. In this paper, we focus on the role of MDPs as models in optimal planning, where the reward structure is central. The main contributions of this paper are (i) to give a coinductive explanation of policy improvement using a new proof principle, based on Banach’s Fixpoint Theorem, that we call contraction coinduction, and (ii) to show that the long-term value function of a policy with respect to discounted sums can be obtained via a generalized notion of corecursive algebra, which is designed to take boundedness into account. We also explore boundedness features of the Kantorovich lifting of the distribution monad to metric spaces.