Honesty in Causal Forests, is it worth it ?

Bachelor Thesis (2022)
Author(s)

M. Havelka (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

S.R. Bongers – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Jesse H. Krijthe – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Rafael Bidarra – Graduation committee member (TU Delft - Computer Graphics and Visualisation)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2022 Matej Havelka
More Info
expand_more
Publication Year
2022
Language
English
Copyright
© 2022 Matej Havelka
Graduation Date
23-06-2022
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Related content

Repository used as the codebase for the obtained results.

https://github.com/MatejHav/causal-methods-evaluation
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Causal machine learning is a relatively new field which tries to find a causal relation between the treatment and the outcome, rather than a correlation between the features and the outcome. To achieve this, many different models were proposed, one of which is the causal forest. Causal forest is made up of a random forest, with a different estimation function in the leaf node, which means it suffers from the same problems, like being easy to overfit. The reason why honesty was introduced was to ensure mathematically that forests do not overfit as easily. This research however, only provided preliminary results and no real testing was done in terms of causal inference. In this paper three scenarios are tested where a comparison is made between a causal forest with and without honesty. Based on the results it seems that honesty does indeed help for trees to not overfit. However in a general setting it hurts the model as it only trains with half of the available data. This makes honest causal forest less accurate in general settings where there is not a lot of training data. In a setting where a large amount of data is provided it seems that honesty does not change the performance, meaning it creates a theoretical guarantee against overfitting with no repercussions for the performance.

Files

Final_paper.pdf
(pdf | 1.58 Mb)
License info not available