Leave-Multiple-Out Informal Benchmarking

Understanding the Behavior of Informal Benchmarking for Multivariate Confounding

Bachelor Thesis (2026)
Author(s)

N.T. Borodjiev (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M. Havelka – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

J.H. Krijthe – Mentor (TU Delft - Electrical Engineering, Mathematics and Computer Science)

A. Anand – Graduation committee member (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2026
Language
English
Graduation Date
23-06-2026
Awarding Institution
Delft University of Technology
Project
CSE3000 Research Project
Programme
Computer Science and Engineering
Faculty
Electrical Engineering, Mathematics and Computer Science
Downloads counter
14
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Informal benchmarking is a popular approach for calibrating sensitivity bounds for hidden confounding by treating observed covariates as if they were unobserved. While leave-one-out (LOO) benchmarking removes a single covariate, leave-multiple-out (LMO) benchmarking removes sets of covariates to approximate multidimensional confounding. In this study, we examine whether LMO benchmarking recovers the confounding strength as the number of features dropped increases. Using a synthetic dataset with bounded covariates and known confounding structure, we compare empirical bounds with an Oracle-like benchmark and the true theoretical value. The theoretical bound increases monotonically as more covariates are omitted, but the empirical LMO bound does not follow this pattern - it plateaus and then declines. The experiments show that this behavior is not explained by estimation error alone. Rather, it is a consequence of informal benchmarking being restricted by the given sample: large bounds are obtained from individuals with certain covariate values. This issue becomes more important as larger subsets are omitted, because the strongest theoretical benchmarks depend on increasingly specific patterns in the omitted covariates. As a result, LMO benchmarking may be more reliable for small omitted subsets, but should be interpreted with increasing caution for larger ones. We conclude that LMO informal benchmarking results should be read as sample-realized benchmarks rather than as the maximum confounding strength possible over the full covariate space.

Files

Research_paper.pdf
(pdf | 2.44 Mb)
License info not available