Analyzing the Impact of Depth and Leaf Size on CATE Estimation in Honest Causal Trees

None, None

Analyzing the Impact of Depth and Leaf Size on CATE Estimation in Honest Causal Trees

A Study of Model Accuracy and Generalization Across Simulated and Real-World Data

Bachelor Thesis (2025)

Author(s)

R. Prodan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.H. Krijthe – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R.K.A. Karlsson – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

R. Guerra Marroquim – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

To reference this document use

https://resolver.tudelft.nl/uuid:8e137ffc-afa2-4fa1-ae31-83f96f7f6c49

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

24-06-2025

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

130

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Causal inference, particularly the estimation of the Conditional Average Treatment Effects (CATE), is necessary for understanding the impact of interventions beyond simple predictions. This study analyzes the influence of key hyperparameter choices, specifically maximum tree depth and minimum leaf size, on the accuracy and generalization of CATE estimates derived from honest and adaptive causal trees. The research explores how these hyperparameters affect the bias-variance trade-off and the model's tendency to overfit or underfit across various simulated data scenarios and a real-world dataset.

The results reveal that optimal hyperparameter configurations are dependent on the data characteristics, such as dimensionality, noise levels, and the complexity of the true causal effects. Honest causal trees demonstrate a better performance in high-dimensional and noisy environments due to their effective variance control. Conversely, in simpler, low-noise settings or complex CATE structures, adaptive causal trees or baseline models frequently achieve better results by reducing bias. The study also highlights the challenges of using moderately sized datasets, where the sample splitting limitations can lead to higher estimation errors. This work provides thorough suggestions for hyperparameter selection, emphasizing the fact that tuning based on the underlying characteristics of the data is needed for achieving the best CATE estimates possible.

Files

Analyzing_the_Impact_of_Depth_... (pdf)

(pdf | 3.12 Mb)

License info not available