Empirical Evaluation of the Performance of CEVAE under Misspecification of the Latent Dimensionality

More Info
expand_more

Abstract

Causal machine learning deals with the inference of causal relationships between variables in observational datasets.
For certain datasets, it is correct to assume a causal graph where information about unobserved confounders can only be obtained through noisy proxies, and CEVAE aims to address this case.
The number of dimensions of the latent space modelled by CEVAE must be specified ahead of time, and this paper investigates the effect of this dimensionality misspecification on the performance of CEVAE.
Results support the idea that underspecification and overspecification both degrade the performance of CEVAE, but indicate that underspecification is worse, at least for the case with few confounders.
In general, the model does not always achieve best performance when the model dimensionality corresponds to the data dimensionality.
Finally, conclusions made on data with linear-Gaussian proxies are the same as those obtained with nonlinear-Gaussian proxies, which indicates these conclusions generalize over different datasets to some extent.