Reliable Offine Policy Evaluation for Individualized Mechanical Ventilation

None, None

Reliable Offine Policy Evaluation for Individualized Mechanical Ventilation

Master Thesis (2024)

Author(s)

W.S. Volkers (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Jesse Krijthe – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jim M. Smit – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Marcel .J.T. Reinders – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

F.A. Oliehoek – Graduation committee member (TU Delft - Sequential Decision Making)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Reinforcement Learning Reliability Calibration Mechanical ventilation Offline Reinforcement Learning Offline policy evaluation Effective sample size

To reference this document use:

https://resolver.tudelft.nl/uuid:117b9835-c012-4dca-bb1d-16c673896943

More Info

expand_more

Publication Year

2024

Language

English

Copyright

Graduation Date

13-03-2024

Awarding Institution

Delft University of Technology

Programme

['Computer Science']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Individualizing mechanical ventilation treatment regimes remains a challenge in the intensive care unit (ICU). Reinforcement Learning (RL) offers the potential to improve patient outcomes and reduce mortality risk, by optimizing ventilation treatment regimes. We focus on the Offline RL setting, using Offline Policy Evaluation (OPE), specifically importance sampling (IS), to evaluate policies learned from observational data. Using a running example, we illustrate how a large difference between the learned policy and actual clinical behavior (behavior policy) limits the reliability of IS-based OPE. To assess this reliability, we use the Effective Sample Size (ESS) as a diagnostic. To achieve reliable evaluation, we apply policy shaping, by incorporating a divergence constraint in the policy learning objective, aiming to reduce the difference between the evaluation and behavior policy. We consider both a Kullback-Leibler (KL) divergence constraint and introduce a new constraint, the ESS divergence. Since effective OPE relies on an accurate estimate of the true behavior policy, we address how such an estimate is acquired. Various classifiers for estimating the behavior policy are systematically evaluated, focusing on both discrimination and calibration performance. Empirical results show the difficulty of learning policies that outperform existing clinical practices and generalize well to unseen patients. Although policy shaping improves the reliability of policy evaluations, no policies that consistently outperform clinician practice were found. The KL divergence constraint generalized better to unseen patients than the ESS divergence, which achieved large ESS without actually reducing the difference between the evaluation and behavior policy. We underscore the necessity of a cautious approach to applying RL in healthcare, and advocate that assessing OPE reliability and behavior policy calibration becomes standard practice, to ensure that only effective and reliable RL policies are considered for real-world clinical trials.

Files

MSc_Thesis_Bas_Volkers_1_.pdf

(pdf | 4.82 Mb)

License info not available