Print Email Facebook Twitter Large-Scale Wildfire Mitigation Through Deep Reinforcement Learning Title Large-Scale Wildfire Mitigation Through Deep Reinforcement Learning Author Altamimi, Abdulelah (The Pennsylvania State University) Lagoa, Constantino (The Pennsylvania State University) Borges, José G. (University of Lisbon) McDill, Marc E. (The Pennsylvania State University) Andriotis, C. (TU Delft Structural Design & Mechanics) Papakonstantinou, K. G. (The Pennsylvania State University) Date 2022 Abstract Forest management can be seen as a sequential decision-making problem to determine an optimal scheduling policy, e.g., harvest, thinning, or do-nothing, that can mitigate the risks of wildfire. Markov Decision Processes (MDPs) offer an efficient mathematical framework for optimizing forest management policies. However, computing optimal MDP solutions is computationally challenging for large-scale forests due to the curse of dimensionality, as the total number of forest states grows exponentially with the numbers of stands into which it is discretized. In this work, we propose a Deep Reinforcement Learning (DRL) approach to improve forest management plans that track the forest dynamics in a large area. The approach emphasizes on prevention and mitigation of wildfire risks by determining highly efficient management policies. A large-scale forest model is designed using a spatial MDP that divides the square-matrix forest into equal stands. The model considers the probability of wildfire dependent on the forest timber volume, the flammability, and the directional distribution of the wind using data that reflects the inventory of a typical eucalypt (Eucalyptus globulus Labill) plantation in Portugal. In this spatial MDP, the agent (decision-maker) takes an action at one stand at each step. We use an off-policy actor-critic with experience replay reinforcement learning approach to approximate the MDP optimal policy. In three different case studies, the approach shows good scalability for providing large-scale forest management plans. The results of the expected return value and the computed DRL policy are found identical to the exact optimum MDP solution, when this exact solution is available, i.e., for low dimensional models. DRL is also found to outperform a genetic algorithm (GA) solutions which were used as benchmarks for large-scale model policy. Subject deep reinforcement learningdynamic programmingforest managementMarkov Decision Processwildfire mitigation To reference this document use: http://resolver.tudelft.nl/uuid:f4cc2b8f-b805-4d11-96e6-6a61a19037eb DOI https://doi.org/10.3389/ffgc.2022.734330 Source Frontiers in Forests and Global Change, 5 Part of collection Institutional Repository Document type journal article Rights © 2022 Abdulelah Altamimi, Constantino Lagoa, José G. Borges, Marc E. McDill, C. Andriotis, K. G. Papakonstantinou Files PDF ffgc_05_734330.pdf 2.35 MB Close viewer /islandora/object/uuid:f4cc2b8f-b805-4d11-96e6-6a61a19037eb/datastream/OBJ/view