S.R. Bongers | TU Delft Repository

SimuDICE: Offline Policy Optimization Through Iterative World Model Updates and DICE Estimation

Bachelor thesis (2024) - C. Brita (author) , F.A. Oliehoek (mentor) , S.R. Bongers (mentor) , CM Jonker (graduation committee member)

In offline reinforcement learning, deriving a policy from a pre-collected set of experiences is challenging due to the limited sample size and the mismatched state-action distribution between the target policy and the behavioral policy that generated the data. Learning a dynamic ...

The Effect of State-visitation Mismatch on Off-policy Performance in Behaviour-agnostic Reinforcement Learning

Bachelor thesis (2024) - K.Y. Chen (author) , S.R. Bongers (mentor) , F.A. Oliehoek (mentor) , CM Jonker (graduation committee member)

Off-policy evaluation has some key problems with one of them being the “curse of horizon”. With recent breakthroughs [1] [2], new estimators have emerged that utilise importance sampling of the individual state-action pairs and reward rather than over the whole trajectory. With t ...

Use of sample-splitting and cross-fitting techniques to mitigate the risks of double-dipping in behaviour-agnostic reinforcement learning

Comparative Analysis

Bachelor thesis (2024) - Y. Aslan (author) , S.R. Bongers (mentor) , F.A. Oliehoek (mentor) , CM Jonker (graduation committee member)

This paper addresses the issue of double-dipping in off-policy evaluation (OPE) in behaviour-agnostic reinforcement learning, where the same dataset is used for both training and estimation, leading to overfitting and inflated performance metrics especially for variance. We intro ...

The Impact of Initial Start Distribution Mismatch on Policy Evaluation in Behavior-agnostic Reinforcement Learning

Bachelor thesis (2024) - T. Sabău (author) , FA Oliehoek (mentor) , S.R. Bongers (mentor) , Catholijn Jonker (graduation committee member)

Behavior-agnostic reinforcement learning is a rapidly expanding research area focusing on developing algorithms capable of learning effective policies without explicit knowledge of the environment's dynamics or specific behavior policies. It proposes robust techniques to perform ...

Impact of State Visitation Mismatch Methods on the Performance of On-Policy Reinforcement Learning

Bachelor thesis (2024) - H. Cho (author) , F.A. Oliehoek (mentor) , S.R. Bongers (mentor) , CM Jonker (graduation committee member)

In the field of reinforcement learning (RL), effectively leveraging behavior-agnostic data to train and evaluate policies without explicit knowledge of the behavior policies that generated the data is a significant challenge. This research investigates the impact of state visitat ...

Understanding Risk Extrapolation (REx) and when it finds Invariant Relationships

Bachelor thesis (2022) - J.L. Hofland (author) , Jesse H. Krijthe (mentor) , Rickard Karlsson (mentor) , S.R. Bongers (mentor) , Thomas Höllt (graduation committee member)

Generalizing models for new unknown datasets is a common problem in machine learning. Algorithms that perform well for test instances with the same distribution as their training dataset often perform severely on new datasets with a different distribution. This problem is caused ...

Evaluating the Performance of the Model Selection with Average ECE and Naive Calibration in Out-of-Domain Generalization Problems for Binary Classifiers

Bachelor thesis (2022) - A. Liu (author) , JH Krijthe (mentor) , Rickard Karlsson (mentor) , S.R. Bongers (mentor) , T. Höllt (graduation committee member)

Out-of-domain (OOD) generalization refers to learning a model from one or more different but related domain(s) that can be used in an unknown test domain. It is challenging for existing machine learning models. Several methods have been proposed to solve this problem, and multi-d ...

Can Invariant Risk Minimization resist the temptation of learning spurious correlations?

Bachelor thesis (2022) - J.A.E. van Lith (author) , R.K.A. Karlsson (mentor) , S.R. Bongers (mentor) , Jesse Krijthe (mentor)

Learning algorithms can perform poorly in unseen environments when they learn
spurious correlations. This is known as the out-of-domain (OOD) generalization problem. Invariant Risk Minimization (IRM) is a method that attempts to solve this problem by learning invariant relati ...

Group Distributionally Robust Optimization for Solving Out-Of-Domain Generalization and Finding Causal Invariant Relationships

Bachelor thesis (2022) - Z. Guan (author) , JH Krijthe (mentor) , Rickard Karlsson (mentor) , S.R. Bongers (mentor) , T. Höllt (graduation committee member)

Out-of-Domain (OOD) generalization is a challenging problem in machine learning about learning a model from one or more domains and making the model perform well on an unseen domain. Empirical Risk Minimization (ERM), the standard machine learning method, suffers from learning sp ...

Honesty in Causal Forests, is it worth it ?

Bachelor thesis (2022) - M. Havelka (author) , S.R. Bongers (mentor) , Jesse Krijthe (mentor) , Rafael Bidarra (graduation committee member)

Causal machine learning is a relatively new field which tries to find a causal relation between the treatment and the outcome, rather than a correlation between the features and the outcome. To achieve this, many different models were proposed, one of which is the causal forest. ...

Empirical study of GANITE’s robustness to hidden confounders

Bachelor thesis (2022) - V.C.O. van Oudenhoven (author) , Jesse Krijthe (mentor) , S.R. Bongers (mentor) , Rafael Bidarra (graduation committee member)

An empirical study is performed exploring the sensitivity to hidden confounders of GANITE, a method for Individualized Treatment Effect (ITE) estimation. Most real world datasets do not measure all confounders and thus it is important to know how crucial this is in order to obtai ...

Empirical Evaluation of the Performance of CEVAE under Misspecification of the Latent Dimensionality

Bachelor thesis (2022) - P. Barták (author) , J.H. Krijthe (mentor) , S.R. Bongers (mentor) , Rafael Bidarra (graduation committee member)

Causal machine learning deals with the inference of causal relationships between variables in observational datasets.
For certain datasets, it is correct to assume a causal graph where information about unobserved confounders can only be obtained through noisy proxies, and C ...

Treatment Effect Estimation of the DragonNet under Overlap Violations

Bachelor thesis (2022) - R.J. van Veen (author) , S.R. Bongers (mentor) , Jesse Krijthe (mentor) , Rafael Bidarra (graduation committee member)

The large amounts of observational data available nowadays have sparked considerable interest in learning causal relations from such data using machine learning methods. One recent method for doing this, which provided promising results, is the DragonNet (Shi et al., 2019), which ...

An empirical study of the effects of unconfoundedness on the performance of Propensity Score Matching

Bachelor thesis (2022) - A. Erdelský (author) , J.H. Krijthe (mentor) , S.R. Bongers (mentor) , Rafa Bidarra (graduation committee member)

The purpose of this research is to analyze the performance of Propensity Score Matching, a causal inference method for causal effect estimation. More specifically, investigate how Propensity Score Matching reacts to breaking the unconfoundedness assumption, one of its core concep ...