This study tackles an important issue in evaluating the reliability of confidence intervals in causal forests by examining how data characteristics and hyperparameters influence actual coverage rates compared to theoretical benchmarks. Using synthetic data sets with polynomial tr
...
This study tackles an important issue in evaluating the reliability of confidence intervals in causal forests by examining how data characteristics and hyperparameters influence actual coverage rates compared to theoretical benchmarks. Using synthetic data sets with polynomial treatment effects, Sobol sampling, High-Dimensional Model Representation (HDMR), and comprehensive grid searches, the study assesses causal forest performance in different data contexts.
A primary discovery is the identification of a practical limit for reliable confidence interval coverage: When the sum of confounders and effect modifiers exceeds 4, coverage rates drop considerably below 80%, even for simple treatment effect functions. This limitation remains steady despite substantial increases in computational resources.
The examination of hyperparameters revealed that the most influential parameters are the maximum tree depth and the balance tolerance in splits, which demonstrate substantial changes in performance, both of which performed best at their maximums (unlimited and 0.5, respectively). Other key suggestions involve increasing the training data fraction per tree from 0.45 to 0.5, keeping the minimum impurity decrease threshold at 0.0, and utilizing at least ≈ 2400 trees to meet theoretical expectations.
In addition, this paper did not identify any noteworthy interaction between tree count and sample size. As a result, both of these characteristics can be optimized independently of each other.
These findings provide systematic guidelines for practitioners to assess when causal forest confidence intervals are reliable and how to optimize them, bridging the gap between theoretical guarantees and practical performance.