Interpretation and analysis of deep reinforcement learning driven inspection and maintenance policies for engineering systems

None, None; None, None; None, None; None, None; None, None

Interpretation and analysis of deep reinforcement learning driven inspection and maintenance policies for engineering systems

Conference Paper (2023)

Author(s)

Pablo G. Morato (Technical University of Denmark (DTU))

Konstantinos G. Papakonstantinou (The Pennsylvania State University)

Charalampos Andriotis (TU Delft - Architectural Technology)

Nandar Hlaing (Université de Liège)

Athanasios Kolios (Technical University of Denmark (DTU))

Research Group

Architectural Technology

To reference this document use:

https://resolver.tudelft.nl/uuid:34d22a1e-1137-480d-a581-bf1ce5e48453

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Architectural Technology

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The application of Deep Reinforcement Learning (DRL) for the management of engineering systems has shown very promising results in terms of optimality and scalability. The interpretability of these policies by decision-makers who are so far mostly familiar with traditional approaches is also needed for implementation. In this work, we address this topic by providing a comprehensive overview of POMDP- and DRL-based management policies, along with simulation-based implementation details, for facilitating their interpretation. By mapping a sufficient statistic, namely a belief state, to the current optimal action, POMDP-DRL strategies are able to automatically adapt in time considering long-term sought objectives and the prior history. Through simulated policy realizations, POMDP-DRL-based strategies identified for representative inspection and maintenance planning settings are thoroughly analyzed. The results reveal that if the decision-maker opts for an alternative, even suboptimal, action other than the one suggested by the DRL-based policy, the belief state will be accordingly updated and can still be used as input for the remainder of the planning horizon, without any requirements for model retraining.

Files

Submission_65.pdf

(pdf | 0.343 Mb)