J.H. Krijthe | TU Delft Repository

Developing a prediction model for respiratory deterioration in mechanically ventilated ICU patients

Master thesis (2026) - E.L. den Breejen , A. Schoe , D.M.J. Tax , F.E. Smits , J.H. Krijthe

Objective
The primary aim of this study was to develop and validate a machine learning prediction model for respiratory deterioration in mechanically ventilated Intensive Care Unit (ICU) patients. The secondary aim was to identify physiological parameters associated with resp ...

Objective
The primary aim of this study was to develop and validate a machine learning prediction model for respiratory deterioration in mechanically ventilated Intensive Care Unit (ICU) patients. The secondary aim was to identify physiological parameters associated with respiratory failure during mechanical ventilation.

Methods
Two distinct prediction models were developed using data from ICU patients admitted to the Leiden University Medical Centre (LUMC) between 2018 and 2023. Patients receiving invasive mechanical ventilation (IMV) for at least 48 hours with a PaO2/FiO2 ratio below 40 kPa were included and allocated to COVID training, COVID test, or non-COVID test sets. Model 1 predicts respiratory deterioration within six hours after switching from controlled to assisted ventilation. Model 2 is an hourly updating model predicting respiratory deterioration occurring more than six hours after this switch. XGBoost models were cross-validated on the COVID training set to identify the optimal observation windows and prediction horizons, after which feature selection and hyperparameter optimisation were performed. Model 1 was optimised for the area under the receiver operating characteristic (AUROC) and Model 2 for the area under the precision-recall curve (AUPRC). Discriminative performance, generalisability, and clinical utility were evaluated on the COVID and non-COVID test sets.

Results
A total of 296 patients were included in the COVID training set, 78 in the COVID test set, and 755 to the non-COVID test set. For Model 1, a one-hour observation window was selected. The most important features were the mean fraction of inspired oxygen (FiO2), propofol infusion rate, and peripheral oxygen saturation (SpO2). This model achieved an AUROC of 0.78 on the COVID test and 0.76 on the non-COVID test set. For model 2, a two-hour observation window and a six-hour prediction horizon were selected, with the SpO2/FiO2 ratio as the most important input feature. This model achieved an AUPRC of 0.05 on the COVID test set and 0.03 on the non-COVID test set.

Conclusion
Model 1 demonstrated moderate discriminative performance but limited clinical utility at relevant operating points. Model 2 showed very limited predictive value, primarily due to extreme class imbalance. Consequently, neither model is currently suitable for clinical implementation. With larger datasets and more advanced modelling techniques, Model 1 may have the potential to become a clinically useful decision support tool to support decisions on switching from controlled to assisted ventilation.

Safer causal inference

Theory and algorithms for falsification, trial augmentation and policy evaluation

Doctoral thesis (2026) - R.K.A. Karlsson , M.J.T. Reinders , J.H. Krijthe

Estimating the effect of an intervention on an outcome is a central challenge across science and society. In medicine, we may ask whether a drug effectively treats a disease, and in economics, whether a new policy reduces unemployment. Estimating such effects from data, a process ...

Estimating the effect of an intervention on an outcome is a central challenge across science and society. In medicine, we may ask whether a drug effectively treats a disease, and in economics, whether a new policy reduces unemployment. Estimating such effects from data, a process known as causal inference, is essential but inherently difficult because it often relies on untestable assumptions to ensure unbiased identification of treatment effects. A key example of such an untestable assumption is the absence of unmeasured confounding, meaning that no hidden variable influences both the treatment and the outcome. When this assumption fails, something which we cannot directly verify, treatment effect estimates may become biased. This ultimately can lead to untrustworthy conclusions and, in the worst case, unsafe decisions, such as prescribing the wrong drug to a patient. The central question of this dissertation is therefore whether we can develop methods for safer causal inference that either detect violations of its underlying assumptions or remain robust when those assumptions are violated.
In Part One, we address the first aspect of detecting violations of causal identification assumptions. We focus on settings with data from multiple sources, such as hospitals or locations, where distributional shifts naturally occur. Under specific independence conditions on the causal mechanisms driving these shifts, we first present a nonparametric test to falsify the assumption of no unmeasured confounding. To obtain these results, we introduce a novel technique utilizing hierarchical causal graphical models. Thereafter, we focus on improving the statistical efficiency of this test, which is achieved by reformulating the independence condition using parameterized linear models. Finally, we extend the hierarchical modeling approach to other identification settings, specifically by testing the validity of mediators and instrumental variables used in two additional common identification strategies.
In Parts Two and Three, we develop methods that instead are robust when causal identification assumptions are violated.We revisit two commonly occurring problem settings when doing causal inference and demonstrate that it is possible to develop methods that either remove the need for, or rely on, weaker and more plausible assumptions than those traditionally made. In the first setting, we study the problemof augmenting randomized trials using external data to improve efficiency in treatment effect estimation. Typically, such approaches rely on a transportability assumption that relate the populations underlying the trial and external data. But when this transportability assumption is violated, integrating external data can introduce substantial bias. To address this, we propose a novel and efficient estimator that incorporates external data and show that this estimator improves inference on the average treatment effect while guaranteeing that it never performs worse, and sometimes performs better, than the estimator that relies solely on trial data.We further adapt this estimator to learn heterogeneous treatment effects within the trial population and show that similar safety guarantees hold for this problem.
In the second setting, we examine the evaluation of treatment allocation strategies using Qini curves. Standard methods for estimating Qini curves assume no interference between treated units, meaning that the treatment of one unit does not affect others. However, when interference is present, these Qini curves can be misleading and lead to incorrect evaluation of treatment allocation strategies.We therefore propose multiple estimators to handle the interference, specifically in settings where units within a cluster may affect one another but not units in other clusters.We identify a bias-variance trade-off in these estimators and, through both theoretical and empirical results, provide practical guidance on how practitioners can choose among them. The dissertation concludes with a discussion of broader considerations, limitations of the presented research, and potential directions for future work.We find that it is indeed possible to make causal inference safer by detecting assumption violations and reducing reliance on untestable assumptions. Nonetheless, many open and important questions remain, offering promising avenues for further research on this topic.

Effects of Input Representation of Bone Shapes on Latent Space Organization

Master thesis (2025) - L. Goemans , J.H. Krijthe , G. van Tulder , T. Höllt

Osteoarthritis (OA) is a prevalent musculoskeletal disease, and radiographic assessment remains the standard for diagnosis and grading. However, expert grading is subjective and intensity-based automated methods are sensitive to imaging variability. As a potential solution to the ...

A recursive clustering scheme for identifying transportable subgroups between multiple RCT populations

Master thesis (2025) - M. Jaić , J.H. Krijthe , R.K.A. Karlsson , J.M. Smit , S.E. Verwer

Combining data from Randomized Controlled Trials (RCTs) is a widely used method to estimate causal treatment effects. In order to combine data, the property of transportability, under which different covariate vectors exhibit similar treatment benefit, must hold between the RCTs. ...

Adversarial generative models applied to diagnosing Osteoarthritis

Evaluating different techniques for fine-tuning discriminator models to classify osteoarthritis

Bachelor thesis (2025) - T.W. den Boer , G. van Tulder , J.H. Krijthe , M. Weinmann

Osteoarthritis is a chronic joint disease in which the protective cartilage between bones deteriorates over time, leading to pain, stiffness, and reduced mobility. Diagnosis is a time-consuming and somewhat subjective process. To address this challenge, machine learning technique ...

Evaluating the Value of Longitudinal Hip Radiographs in Self-Supervised Pretraining for Osteoarthritis Classification

Bachelor thesis (2025) - D. Stoyanova , J.H. Krijthe , G. van Tulder , M. Weinmann

Self-supervised learning (SSL) is a promising approach for medical imaging tasks by reducing the need for labeled data, but most existing SSL methods treat each scan as an isolated sample and overlook the fact that patients often have multiple radiographs taken over time. These l ...

Anatomy-aware data augmentation techniques in contrastive self-supervised learning for diagnosing hip osteoarthritis in X-ray images

Bachelor thesis (2025) - Z. Yancheva , J.H. Krijthe , G. van Tulder , M. Weinmann

Supervised learning approaches have proven to be useful in diagnosing Osteoarthritis from X-ray images, aiding professionals in an otherwise time-consuming and subjective process. However, in the medical field, labeled data is scarce. For this reason, we investigate a contrastive ...

Analyzing the Impact of Depth and Leaf Size on CATE Estimation in Honest Causal Trees

A Study of Model Accuracy and Generalization Across Simulated and Real-World Data

Bachelor thesis (2025) - R. Prodan , J.H. Krijthe , R.K.A. Karlsson , R. Guerra Marroquim

Causal inference, particularly the estimation of the Conditional Average Treatment Effects (CATE), is necessary for understanding the impact of interventions beyond simple predictions. This study analyzes the influence of key hyperparameter choices, specifically maximum tree dept ...

Empirical Study on the Impact of Network Architecture on Causal Effect Estimation with TARNet

Bachelor thesis (2025) - M.M. Witczak , J.H. Krijthe , R.K.A. Karlsson , R. Guerra Marroquim

Estimating the Conditional Average Treatment Effect (CATE) with neural networks adapted for causal inference, like TARNet, is a promising approach, yet the impact of model architecture on performance remains underexplored.
This paper systematically investigates how the depth ...

When Causal Forests Mislead

Evaluating the precision of Confidence Intervals

Bachelor thesis (2025) - R. Iordan , J.H. Krijthe , R.K.A. Karlsson , R. Guerra Marroquim

This study tackles an important issue in evaluating the reliability of confidence intervals in causal forests by examining how data characteristics and hyperparameters influence actual coverage rates compared to theoretical benchmarks. Using synthetic data sets with polynomial tr ...

Robust Causal Inference with Multi-task Gaussian Processes

Enhancing Generalization and Calibration through Data-Aware Kernel and Prior Design

Bachelor thesis (2025) - L.R. Ritter , J.H. Krijthe , R.K.A. Karlsson , R. Guerra Marroquim

Causal Multi-task Gaussian Processes (CMGPs) provide a Bayesian approach for estimating in-
dividualized treatment effects by modeling potential outcomes as correlated functions. However,
they struggle under high-dimensionality and treatment imbalance, leading to overfitt ...

Evaluating the Robustness of Interventional Normalizing Flows under Nuisance Misspecification

Bachelor thesis (2025) - R. Allu , R.K.A. Karlsson , J.H. Krijthe

Interventional Normalizing Flows (INFs) are a recently proposed method for estimating interventional outcome distributions from observational data. A central component of this approach is the nuisance flow, whose function is to estimate the propensity score and the conditional ou ...

Anatomy-Aware Masked Autoencoders for Hip Osteoarthritis Classification in X-ray Images

Bachelor thesis (2025) - J.C. van Beusekom , G. van Tulder , J.H. Krijthe , M. Weinmann

Self Supervised Learning (SSL) has been shown to effectively utilise unlabelled data for pre-training models used in down-stream medical tasks. This property of SSL enables it to use much larger datasets when compared to supervised models, which require manually labelled data. Me ...

The Effect of Cross-Domain Class Imbalance on Distribution Alignment

Master thesis (2025) - J. Luu , J.H. Krijthe , G. van Tulder , E. Demirović

Statistical distribution alignment methods for domain adaptation assume similar class distributions across domains, but this assumption cannot always be guaranteed in medical imaging data. This research investigates the effect of cross-domain class imbalance on statistical distri ...

Empirical Analysis of Confounding Bias in Feature Representations for Average Treatment Effect Estimation

Master thesis (2024) - R.J. van Veen , J.H. Krijthe , R.K.A. Karlsson

Causal inference methods are often used for estimating the effects of an action on an outcome using observational data, which is a key task across various fields, such as medicine or economics. A number of methods make use of representation learning to try to obtain more
inf ...

Causal Sensitivity Analysis: f-sensitivity through entropic value at risk computation

Master thesis (2024) - M. Havelka , J.H. Krijthe , F.A. Oliehoek , M. Reinders

The field of causal inference provides a variety of estimators that can be used to find the effect of a treatment on an outcome based on observational data. However, many of these estimators require the unconfoundedness assumption, stating that all relevant confounders are observ ...

Individualized treatment effect prediction for Mechanical Ventilation

Using Causal Multi-task Gaussian Process to estimate the individualized treatment effect of a low vs high PEEP regime on ICU patients

Bachelor thesis (2024) - K.T. Mc Alpine , J.H. Krijthe , R.K.A. Karlsson , J.M. Smit , J.A. Baaijens

This research investigates the use of Causal Multi-task Gaussian Process (CMGP) for estimating the individualized treatment effect (ITE) of low versus high Positive End-Expiratory Pressure (PEEP) regimes on ICU patients requiring mechanical ventilation. The study addresses the co ...

X-Ray Image Segmentation of the Hip Joint

Segmentation of the hip joint space based on a radial projection originating from the center of the femoral head

Bachelor thesis (2024) - C. Blok , M.A. van den Berg , G. van Tulder , J.H. Krijthe , X. Zhang

The severity of hip osteoarthritis is measured a.o. by the minimal distance between the femoral head and the acetabular roof in an X-ray image. However, the whole joint space profile might be a more accurate estimator, since it would include irregularities in the bone surface. Th ...

The severity of hip osteoarthritis is measured a.o. by the minimal distance between the femoral head and the acetabular roof in an X-ray image. However, the whole joint space profile might be a more accurate estimator, since it would include irregularities in the bone surface. These irregular bulges (osteophytes) on the bone surface are one of the signals that a person might have OA. Thus the stage of OA might be better estimated automatically by having this data in the joint space profile instead of just using the minimal joint space.

For this joint space profile, the distance between the femoral head and the acetabular roof needs to be calculated. Therefore, the positions of these parts in the hip joint are required to be know. These can be retrieved from e.g. a segmentation mask.

One way of calculating the distance in a joint is to use a radial projection. A radial projection is a way of projecting points from a curved space to a plane by projecting lines from a central point along increasing angles.

In this paper, we investigate how the joint space profile can be segmented most accurately from a radial projection originating from the center of the femoral head by several comparing noise filtering and edge-finding algorithms. After which is shown that a custom algorithm based on the theory behind edge detection in noisy images works most reliably and accurately.

There are still multiple points of improvement for this algorithm. The femoral head can be segmented more accurately than the acetabular roof, the segmentation of the latter could be optimized by detecting the brightest line (peaks) instead of the most sudden change (steepest gradient) in the X-ray image as the edge for the femoral head. The algorithm could be further improved by taking care of local outliers off those edges.

In conclusion, this paper compares multiple ways of segmenting the joint space of the hip joint. The best-performing algorithm could in the future be used in an assisting tool for doctors to highlight important irregularities and measurements in the hip joint space.

Machine Learning for Personalized Respiratory Care

A DR-learner Approach to Positive End-Expiratory Pressure Effect Estimation

Bachelor thesis (2024) - R. Melika , R.K.A. Karlsson , J.M. Smit , J.H. Krijthe , J.A. Baaijens

Mechanical ventilation with positive end-expiratory pressure (PEEP) is a critical intervention for patients in intensive care units (ICUs) with acute respiratory failure. Identifying the optimal PEEP level is challenging due to conflicting evidence from studies comparing low and ...

Challenges in Domain Adaptation for Medical Image Segmentation

A Study on Generalization of Hip X-Ray Segmentation for Osteoarthritis

Bachelor thesis (2024) - A.J.A. Bayle , M.A. van den Berg , J.H. Krijthe , G. van Tulder , X. Zhang

Osteoarthritis is a degenerative disease that affects the aging population by degrading the cartilage in the joints. The early and accurate diagnosis of this disease is key to effective treatment. For an early and accurate diagnosis of this disease, clinicians often use X-ray ima ...