Interventional Normalizing Flows (INFs) are a recently proposed method for estimating interventional outcome distributions from observational data. A central component of this approach is the nuisance flow, whose function is to estimate the propensity score and the conditional ou
...
Interventional Normalizing Flows (INFs) are a recently proposed method for estimating interventional outcome distributions from observational data. A central component of this approach is the nuisance flow, whose function is to estimate the propensity score and the conditional outcome distribution. INFs are claimed to be doubly robust, meaning they can yield valid estimates even if only one of these components is correctly specified. This study investigates the practical limits of this robustness by asking two questions: (1) How do interventional estimates behave when nuisance flow components are entirely misspecified? and (2) How sensitive are these estimates to more realistic imperfections such as suboptimal hyperparameters or injected noise? Through experiments on four benchmark datasets with varying levels of confounding and distributional complexity, we find that INFs remain robust under low-confounding conditions even when both nuisance components are broken. However, in highconfounding settings, even partial misspecification can cause estimates to degrade substantially, undermining the doubly robust property. These results highlight the importance of carefully validating nuisance components and suggest that the theoretical guarantees of INFs may not always hold in practice.