Optimal maintenance of deteriorating systems integrating deep reinforcement learning and Bayesian inference

More Info
expand_more

Abstract

An issue of utmost significance constitutes the maintenance of engineering systems exposed to corrosive environments, e.g. coastal and marine environments, highly acidic environments, etc. The most beneficial sequence of maintenance decisions, i.e. the one that corresponds to the minimum maintenance cost, can be sought as the solution to an optimization problem. Owing to the high complexity of this sequential decision optimization problem, traditional methods such as thresholdbased approaches, fail to arrive at an optimal strategy, while at the same time the commonly used offline knowledge about the environment can not capture efficiently the stochastic way in which an engineering system deteriorates. Over the last few years, Deep Reinforcement Learning (DRL) has been proven a promising tool to tackle such problems, being often limited though by the dimensionality curse and the implications caused by large state and action spaces, an issue which leads to simplifications like their discretization. Bayesian principles and model updating are the most widely used tools to model accurately systems with high uncertainty, by incorporating data acquired through monitoring devices and thus improving the knowledge about the stochastic system.

This research proposes an integrated framework that aims to determine an optimal sequence of maintenance decisions over the lifespan of deteriorating engineering systems, combining the aforementioned core concepts of Deep Reinforcement Learning (DRL) and Bayesian Model Updating (BMU). More specifically, it investigates different Deep Reinforcement Learning (DRL) algorithms, namely Double Deep Q-Network (DDQN), Advantage Actor Critic (A2C), and Proximal Policy Optimization (PPO), while the updating of the uncertain parameters is performed through sampling, i.e. No-U-Turn Sampler (NUTS). All these tools will be first applied to elementary problems for the sake of verification and validation, while the culmination of this research is the application of the framework on a more realistic and complicated, multi-component structure. The obtained results are compared with benchmark performances to properly showcase the efficiency and the weaknesses of the tool.