Empirical Study on the Impact of Network Architecture on Causal Effect Estimation with TARNet

Bachelor Thesis (2025)
Author(s)

M.M. Witczak (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

JH Krijthe – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

R.K.A. Karlsson – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

R. Guerra Marroquim – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Estimating the Conditional Average Treatment Effect (CATE) with neural networks adapted for causal inference, like TARNet, is a promising approach, yet the impact of model architecture on performance remains underexplored.
This paper systematically investigates how the depth and width of TARNet affect the CATE estimation in diverse simulated data environments. The research investigates two central questions: how TARNet's performance varies across data regimes (e.g., confounding strength, sample size), and how its optimal architecture changes in response to these conditions.
A comprehensive set of simulation-based experiments is conducted using the CATENets framework, isolating and varying factors such as sample size, feature dimensionality, confounding strength, and the presence of noise. The results demonstrate that deeper architectures generally yield better performance in complex or high-dimensional scenarios, whereas narrower networks are preferable in small-sample or high-noise settings due to their regularizing effect. Furthermore, the findings suggest that there is no universally optimal architecture. The best configuration depends on the specific characteristics of the data. The study concludes with practical recommendations for architecture selection based on the experiments conducted.

Files

Research_paper_mwitczak.pdf
(pdf | 1.92 Mb)
License info not available