Leave-Multiple-Out Informal Benchmarking

None, None

Leave-Multiple-Out Informal Benchmarking

Understanding the Behavior of Informal Benchmarking for Multivariate Confounding

Bachelor Thesis (2026)

Author(s)

N.T. Borodjiev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Havelka – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.H. Krijthe – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Anand – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty

Electrical Engineering, Mathematics and Computer Science

Causal machine learning Sensitivity analysis Causal inference Informal benchmarking Hidden confounding

To reference this document use

https://resolver.tudelft.nl/uuid:6462a6f1-fd36-4c01-90c6-f9d8bce529f4

More Info

expand_more

Publication Year

2026

Language

English

Graduation Date

23-06-2026

Awarding Institution

Delft University of Technology

Project

CSE3000 Research Project

Programme

Computer Science and Engineering

Faculty

Electrical Engineering, Mathematics and Computer Science

Downloads counter

14

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Informal benchmarking is a popular approach for calibrating sensitivity bounds for hidden confounding by treating observed covariates as if they were unobserved. While leave-one-out (LOO) benchmarking removes a single covariate, leave-multiple-out (LMO) benchmarking removes sets of covariates to approximate multidimensional confounding. In this study, we examine whether LMO benchmarking recovers the confounding strength as the number of features dropped increases. Using a synthetic dataset with bounded covariates and known confounding structure, we compare empirical bounds with an Oracle-like benchmark and the true theoretical value. The theoretical bound increases monotonically as more covariates are omitted, but the empirical LMO bound does not follow this pattern - it plateaus and then declines. The experiments show that this behavior is not explained by estimation error alone. Rather, it is a consequence of informal benchmarking being restricted by the given sample: large bounds are obtained from individuals with certain covariate values. This issue becomes more important as larger subsets are omitted, because the strongest theoretical benchmarks depend on increasingly specific patterns in the omitted covariates. As a result, LMO benchmarking may be more reliable for small omitted subsets, but should be interpreted with increasing caution for larger ones. We conclude that LMO informal benchmarking results should be read as sample-realized benchmarks rather than as the maximum confounding strength possible over the full covariate space.

Files

Research_paper.pdf

(pdf | 2.44 Mb)

License info not available