When Causal Forests Mislead

Evaluating the precision of Confidence Intervals

Bachelor Thesis (2025)
Author(s)

R. Iordan (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

JH Krijthe – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

R.K.A. Karlsson – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

R. Guerra Marroquim – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
24-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

This study tackles an important issue in evaluating the reliability of confidence intervals in causal forests by examining how data characteristics and hyperparameters influence actual coverage rates compared to theoretical benchmarks. Using synthetic data sets with polynomial treatment effects, Sobol sampling, High-Dimensional Model Representation (HDMR), and comprehensive grid searches, the study assesses causal forest performance in different data contexts.

A primary discovery is the identification of a practical limit for reliable confidence interval coverage: When the sum of confounders and effect modifiers exceeds 4, coverage rates drop considerably below 80%, even for simple treatment effect functions. This limitation remains steady despite substantial increases in computational resources.

The examination of hyperparameters revealed that the most influential parameters are the maximum tree depth and the balance tolerance in splits, which demonstrate substantial changes in performance, both of which performed best at their maximums (unlimited and 0.5, respectively). Other key suggestions involve increasing the training data fraction per tree from 0.45 to 0.5, keeping the minimum impurity decrease threshold at 0.0, and utilizing at least ≈ 2400 trees to meet theoretical expectations.

In addition, this paper did not identify any noteworthy interaction between tree count and sample size. As a result, both of these characteristics can be optimized independently of each other.

These findings provide systematic guidelines for practitioners to assess when causal forest confidence intervals are reliable and how to optimize them, bridging the gap between theoretical guarantees and practical performance.

Files

Research_paper-93.pdf
(pdf | 12.4 Mb)
License info not available