An empirical study of the effects of unconfoundedness on the performance of Propensity Score Matching

None, None

An empirical study of the effects of unconfoundedness on the performance of Propensity Score Matching

Bachelor Thesis (2022)

Author(s)

A. Erdelský (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

J.H. Krijthe – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

S.R. Bongers – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A.R. Bidarra – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Machine Learning Causal Inference Propensity Causality

To reference this document use

https://resolver.tudelft.nl/uuid:a765961f-2bca-4e7c-a65f-dcade576025c

More Info

expand_more

Publication Year

2022

Language

English

Graduation Date

23-06-2022

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

391

Collections

thesis

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The purpose of this research is to analyze the performance of Propensity Score Matching, a causal inference method for causal effect estimation. More specifically, investigate how Propensity Score Matching reacts to breaking the unconfoundedness assumption, one of its core conceptual pillars. This has been achieved by running PSM on synthetic data that upholds the unconfoundedness condition, and then comparing these results with measurements obtained from running the algorithm on data with confounding features with varying contribution to other variable values and hiding these features individually or in progressively higher numbers. These results are also then compared to Linear Regression, a generic machine learning algorithm, for the sake of comparison of performance. The results obtained point to the observation that when hiding variables that only contribute to the main effect, treatment effect or treatment propensity calculation respectively, PSM performs with the same error no matter which of the three effects the hidden feature affects, making them equivalent in their error contribution. Additionally, it has also become apparent that in all experimental scenarios used in this work, PSM performed very similarly to Linear Regression and did not seem to offer any advantages over the latter in these specific situations.

Files

Research_Paper_PSM_Andrej_Erde... (pdf)

(pdf | 0.539 Mb)

License info not available