Interval Markov Decision Processes with Continuous Action-Spaces

None, None; None, None; None, None; None, None

Interval Markov Decision Processes with Continuous Action-Spaces

Conference Paper (2023)

Author(s)

Giannis Delimpaltadakis (Eindhoven University of Technology)

Morteza Lahijanian (University of Colorado)

Manuel Mazo Espinosa (TU Delft - Team Manuel Mazo Jr)

L. Laurenti (TU Delft - Team Luca Laurenti)

Research Group

Team Manuel Mazo Jr

Copyright

DOI related publication

https://doi.org/10.1145/3575870.3587117

Planning under uncertainty Value iteration Bounded-parameter Markov decision processes Control synthesis Uncertain Markov decision processes

To reference this document use:

https://resolver.tudelft.nl/uuid:814b2ec0-11a1-4080-a782-c43d100ab130

More Info

expand_more

Publication Year

2023

Language

English

Copyright

Research Group

Team Manuel Mazo Jr

ISBN (electronic)

979-8-4007-0033-0

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov models, where the transition probabilities belong to intervals. Recently, there has been a surge of research on employing IMDPs as abstractions of stochastic systems for control synthesis. However, due to the absence of algorithms for synthesis over IMDPs with continuous action-spaces, the action-space is assumed discrete a-priori, which is a restrictive assumption for many applications. Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables, and study value iteration for maximizing expected cumulative rewards. Specifically, we decompose the max-min problem associated to value iteration to |Q| max problems, where |Q| is the number of states of the caIMDP. Then, exploiting the simple form of these max problems, we identify cases where value iteration over caIMDPs can be solved efficiently (e.g., with linear or convex programming). We also gain other interesting insights: e.g., in certain cases where the action set A is a polytope, synthesis over a discrete-action IMDP, where the actions are the vertices of A, is sufficient for optimality. We demonstrate our results on a numerical example. Finally, we include a short discussion on employing caIMDPs as abstractions for control synthesis.

Files

3575870.3587117.pdf

(pdf | 1.04 Mb)