QRES-MARL
A Resilience-Based Multi-Agent Reinforcement Learning Framework for Post-Earthquake Recovery of Interdependent Infrastructures
A. Mavrotas (TU Delft - Architecture and the Built Environment)
C. Andriotis β Mentor (TU Delft - Structures & Materials)
Simona Bianchi β Mentor (TU Delft - Structures & Materials)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
This thesis focuses on using MARL as a decision tool for post-earthquake repair scheduling of interdependent infrastructure. MARL is a multi-agent ML paradigm which combines traditional ML research and game-theoretical approaches. Given the relative increase in natural disaster frequency and the lack of available post-disaster tools such tools are crucial in increasing the climate resilience of cities. Given the stochastic nature of earthquake events and subsequent losses, MARL can be helpful in navigating this uncertainty and finding preferable joint policies.The methodology involves multi-scenario-based seismic hazard assessment, stochastic fragility modelling and prediction of several direct and indirect losses to aggregate them into a holistic community resilience metric. This is then used to compute the instantaneous and cumulative recovery resilience loss values. The tested approach uses two custom built test-beds of 4 and 30 components, and MARL is compared against baseline solvers, including random and importance-based policies. Value Decomposition Network with Parameter Sharing (ππ·π β ππ), Q-Learning with Mixer Network and Parameter Sharing (πππΌπ β ππ), Deep Centralised Multi-Agent Actor Critic (π·πΆππ΄πΆ) are the algorithms tested. ππ·π and πππΌπ are shown to perform similarly to each other and sub-optimally relative to π·πΆππ΄πΆ. π·πΆππ΄πΆ is shown to match importance-based policies when considering full recovery, but convincingly outperforms all other π·π πΏ methods and importance-based policies when considering partial recovery. This shows that π·πΆππ΄πΆ and π·π πΏ more generally is effective at swift early recovery by prioritising components that contribute most to community functionality. ii