Honesty in Causal Forests, is it worth it ?

None, None

Honesty in Causal Forests, is it worth it ?

Bachelor Thesis (2022)

Author(s)

M. Havelka (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.R. Bongers – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jesse H. Krijthe – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Rafael Bidarra – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Causal Inference Causality Causal Forest Honesty

To reference this document use:

https://resolver.tudelft.nl/uuid:811cbca8-1c31-4fc3-9786-30cc9ba5670f

More Info

expand_more

Publication Year

2022

Language

English

Copyright

Graduation Date

23-06-2022

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Abstract

Causal machine learning is a relatively new field which tries to find a causal relation between the treatment and the outcome, rather than a correlation between the features and the outcome. To achieve this, many different models were proposed, one of which is the causal forest. Causal forest is made up of a random forest, with a different estimation function in the leaf node, which means it suffers from the same problems, like being easy to overfit. The reason why honesty was introduced was to ensure mathematically that forests do not overfit as easily. This research however, only provided preliminary results and no real testing was done in terms of causal inference. In this paper three scenarios are tested where a comparison is made between a causal forest with and without honesty. Based on the results it seems that honesty does indeed help for trees to not overfit. However in a general setting it hurts the model as it only trains with half of the available data. This makes honest causal forest less accurate in general settings where there is not a lot of training data. In a setting where a large amount of data is provided it seems that honesty does not change the performance, meaning it creates a theoretical guarantee against overfitting with no repercussions for the performance.

Files

Final_paper.pdf

(pdf | 1.58 Mb)

License info not available