K. G. Papakonstantinou | TU Delft Repository

Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning

Journal article (2023) - P. G. Morato, C. P. Andriotis, K. G. Papakonstantinou, P. Rigo

In the context of modern engineering, environmental, and societal concerns, there is an increasing demand for methods able to identify rational management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level, often assuming statistical, structural, or cost independence among components, due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks, decoupling the originally joint system state space to component networks conditional on shared random variables. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies. ...

In the context of modern engineering, environmental, and societal concerns, there is an increasing demand for methods able to identify rational management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level, often assuming statistical, structural, or cost independence among components, due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks, decoupling the originally joint system state space to component networks conditional on shared random variables. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.

Deep reinforcement learning-based life-cycle management of deteriorating transportation systems

Conference paper (2022) - M. Saifullah, C.P. Andriotis, K.G. Papakonstantinou, S.M. Stoffels

Efficient life-cycle bridge asset management delineates a planning optimization problem of paramount importance for the operational reliability of transportation infrastructure. It necessitates adept inspection and maintenance policies able to reduce risks and costs while incorporating long-term stochastic deterioration models, inference under uncertain structural health data, and various probabilistic and deterministic constraints. Structural integrity management policies for individual bridges, which are mere constituents of broader complex networks, cannot be devised in isolation of the policies of other system components, such as other bridges and pavement sections, and without considering system functions and traffic considerations. Such network effects render the optimization problem even harder to solve. Currently, age- or condition-based maintenance techniques, as well as risk-based or periodic inspection plans, have been used to address this class of challenging optimization problems. However, the efficacy of these techniques is often limited by optimality-, scalability-, and uncertainty-induced complexities. In practice, infrastructure management agencies often treat interconnected systems using disjoint plans for different component types, which in general do not ensure system-level optimality. To tackle the above, the optimization problem is herein cast within constrained Partially Observable Markov Decision Processes (POMDPs), which provide a comprehensive mathematical framework for stochastic sequential decision settings under observation/monitoring data uncertainty and limited resources. For the problem solution, the DDMAC algorithm (Deep Decentralized Multi-agent Actor-Critic) is successfully used, a deep reinforcement learning algorithm well-suited for management of large multi-state multi-component systems, as illustrated in an example application of an existing transportation network in Virginia, USA. The studied network comprises several bridge and pavement components exhibiting nonstationary deterioration, and various agency-imposed constraints, and traffic delay and risk factors are considered. Comparisons against conventional management policies showcase that the DDMAC solution significantly outperforms its counterparts. ...

Efficient life-cycle bridge asset management delineates a planning optimization problem of paramount importance for the operational reliability of transportation infrastructure. It necessitates adept inspection and maintenance policies able to reduce risks and costs while incorporating long-term stochastic deterioration models, inference under uncertain structural health data, and various probabilistic and deterministic constraints. Structural integrity management policies for individual bridges, which are mere constituents of broader complex networks, cannot be devised in isolation of the policies of other system components, such as other bridges and pavement sections, and without considering system functions and traffic considerations. Such network effects render the optimization problem even harder to solve. Currently, age- or condition-based maintenance techniques, as well as risk-based or periodic inspection plans, have been used to address this class of challenging optimization problems. However, the efficacy of these techniques is often limited by optimality-, scalability-, and uncertainty-induced complexities. In practice, infrastructure management agencies often treat interconnected systems using disjoint plans for different component types, which in general do not ensure system-level optimality. To tackle the above, the optimization problem is herein cast within constrained Partially Observable Markov Decision Processes (POMDPs), which provide a comprehensive mathematical framework for stochastic sequential decision settings under observation/monitoring data uncertainty and limited resources. For the problem solution, the DDMAC algorithm (Deep Decentralized Multi-agent Actor-Critic) is successfully used, a deep reinforcement learning algorithm well-suited for management of large multi-state multi-component systems, as illustrated in an example application of an existing transportation network in Virginia, USA. The studied network comprises several bridge and pavement components exhibiting nonstationary deterioration, and various agency-imposed constraints, and traffic delay and risk factors are considered. Comparisons against conventional management policies showcase that the DDMAC solution significantly outperforms its counterparts.

Appraisal and mathematical properties of fragility analysis methods

Conference paper (2022) - S. Yi, K. G. Papakonstantinou, C.P. Andriotis, J. Song

Fragility analysis aims to compute the probabilities of a system exceeding certain damage conditions given different levels of hazard intensity. Fragility analysis is therefore a key process of performance-based earthquake engineering, with a number of approaches developed and widely recognized, including Incremental Dynamic Analysis (IDA), Multiple Stripe Analysis (MSA), and cloud analysis. Additionally, extended fragility analysis has recently been shown to possess important attributes of mathematical consistency and extensibility. This work provides a critical review of the different fragility methods by explaining the underlying probabilistic models and assumptions, as well as their connections to the extended fragility method. It is proven that IDA-based fragility curves provide an upper bound of the actual fragility, and cloud analysis manifests suboptimality issues arising from its underlying assumptions. MSA is identified to be a probit-linked Bernoulli regression model, similar to the one proposed by Shinozuka and coworkers. The latter, in turn, is shown to be a limiting subcase of the generalized linear model framework introduced within the extended fragility analysis. The paper first presents a simple case of one intensity measure and two damage condition states, and the discussion is subsequently extended to more general cases of multiple intensity measures and damage states. The discussed attributes are demonstrated in several numerical applications. Overall, this work aims to provide new insights on fragility methods, enabling efficient, accurate, and consistent estimations of structural performance, as well as promoting new research directions in earthquake engineering and other related fields. ...

Large-Scale Wildfire Mitigation Through Deep Reinforcement Learning

Journal article (2022) - Abdulelah Altamimi, Constantino Lagoa, José G. Borges, Marc E. McDill, C. P. Andriotis, K. G. Papakonstantinou

Forest management can be seen as a sequential decision-making problem to determine an optimal scheduling policy, e.g., harvest, thinning, or do-nothing, that can mitigate the risks of wildfire. Markov Decision Processes (MDPs) offer an efficient mathematical framework for optimizing forest management policies. However, computing optimal MDP solutions is computationally challenging for large-scale forests due to the curse of dimensionality, as the total number of forest states grows exponentially with the numbers of stands into which it is discretized. In this work, we propose a Deep Reinforcement Learning (DRL) approach to improve forest management plans that track the forest dynamics in a large area. The approach emphasizes on prevention and mitigation of wildfire risks by determining highly efficient management policies. A large-scale forest model is designed using a spatial MDP that divides the square-matrix forest into equal stands. The model considers the probability of wildfire dependent on the forest timber volume, the flammability, and the directional distribution of the wind using data that reflects the inventory of a typical eucalypt (Eucalyptus globulus Labill) plantation in Portugal. In this spatial MDP, the agent (decision-maker) takes an action at one stand at each step. We use an off-policy actor-critic with experience replay reinforcement learning approach to approximate the MDP optimal policy. In three different case studies, the approach shows good scalability for providing large-scale forest management plans. The results of the expected return value and the computed DRL policy are found identical to the exact optimum MDP solution, when this exact solution is available, i.e., for low dimensional models. DRL is also found to outperform a genetic algorithm (GA) solutions which were used as benchmarks for large-scale model policy. ...

Managing offshore wind turbines through Markov decision processes and dynamic Bayesian networks

Conference paper (2022) - P. G. Morato, K. G. Papakonstantinou, C.P. Andriotis, Philippe Rigo

Efficient planning of inspection and maintenance (I&M) actions in civil and maritime environments is of paramount importance to balance management costs against failure risk caused by deteriorating mechanisms. Determining I&M policies for such cases constitutes a complex sequential decision-making optimization problem under uncertainty. Addressing this complexity, Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical methodology for stochastic optimal control, in which the optimal actions are prescribed as a function of the entire, dynamically updated, state probability distribution. As shown in this paper, by integrating Dynamic Bayesian Networks (DBNs) with POMDPs, advanced algorithmic schemes of probabilistic inference and decision optimization under uncertainty can be uniquely combined into an efficient planning platform. To demonstrate the capabilities of the proposed approach, POMDP and heuristic-based I&M policies are compared, with emphasis on an offshore wind substructure subject to fatigue deterioration. Results verify that POMDP solutions offer substantially reduced costs compared to their counterparts, even in traditional problem settings. ...

Deep reinforcement learning driven inspection and maintenance planning under incomplete information and constraints

Journal article (2021) - C. P. Andriotis, K. G. Papakonstantinou

Determination of inspection and maintenance policies for minimizing long-term risks and costs in deteriorating engineering environments constitutes a complex optimization problem. Major computational challenges include the (i) curse of dimensionality, due to exponential scaling of state/action set cardinalities with the number of components; (ii) curse of history, related to exponentially growing decision-trees with the number of decision-steps; (iii) presence of state uncertainties, induced by inherent environment stochasticity and variability of inspection/monitoring measurements; (iv) presence of constraints, pertaining to stochastic long-term limitations, due to resource scarcity and other infeasible/undesirable system responses. In this work, these challenges are addressed within a joint framework of constrained Partially Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic dynamic programming with Bayesian inference principles. Multi-agent DRL addresses (i), through deep function parametrizations and decentralized control assumptions. Challenge (iv) is herein handled through proper state augmentation and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints and budget limitations. The underlying algorithmic steps are provided, and the proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions, in cases where decisions must be made in the most resource- and risk-aware manner. ...