Honesty in Causal Forests, is it worth it ?

More Info
expand_more

Abstract

Causal machine learning is a relatively new field which tries to find a causal relation between the treatment and the outcome, rather than a correlation between the features and the outcome. To achieve this, many different models were proposed, one of which is the causal forest. Causal forest is made up of a random forest, with a different estimation function in the leaf node, which means it suffers from the same problems, like being easy to overfit. The reason why honesty was introduced was to ensure mathematically that forests do not overfit as easily. This research however, only provided preliminary results and no real testing was done in terms of causal inference. In this paper three scenarios are tested where a comparison is made between a causal forest with and without honesty. Based on the results it seems that honesty does indeed help for trees to not overfit. However in a general setting it hurts the model as it only trains with half of the available data. This makes honest causal forest less accurate in general settings where there is not a lot of training data. In a setting where a large amount of data is provided it seems that honesty does not change the performance, meaning it creates a theoretical guarantee against overfitting with no repercussions for the performance.

Files