Circular Image

J.H. Krijthe

info

Please Note

55 records found

Review (2026) - Wouter M.R. Kant, Wieske K. de Swart, Jim M. Smit, Marco Loog, Jesse H. Krijthe
Dementia risk scores are commonly used tools to estimate the risk of developing Alzheimer's disease and dementia. We lack an overview of what risk scores are used for, what is claimed they ought to be used for, and whether they are suitable for these applications. To address this, we use the ‘Lifestyle for Brain Health’ (LIBRA) score as a representative example risk score and conduct a literature review to study its applications. The goals of this study are (1) to create an overview of how the LIBRA score has been utilized in scientific articles, (2) to record other applications that these same articles mention, and (3) to critically assess whether LIBRA is suitable for these applications. Of the 66 articles included in our review, 36 involved analyzing associations of LIBRA with dementia, cognition, or other outcomes. We also identified several other applications, with 32 articles mentioning LIBRA as an estimate of ‘dementia prevention potential’, 6 articles used LIBRA as a surrogate outcome for their trial or intervention, and 7 articles mentioned that it could help support clinician decisions in practice. Although there is a clear need for tools that can be used for these applications, the amount of evidence supporting the suitability of dementia risk scores for many of these applications is limited. We recommend that researchers transparently report the purposes of these dementia risk scores, which may include causal tasks, and that research is done to evaluate whether it is valid to use these scores in this way. ...

Development and External Validation Using Individual Participant Data From the World COACH Consortium

Journal article (2026) - Myrthe A. van den Berg, Fleur Boel, Michiel M.A. van Buuren, Noortje S. Riedstra, Jinchi Tang, Harbeer Ahedi, Nigel K. Arden, J.H. Krijthe, Rintje Agricola, More Authors...
Objective
This study aims to develop hip morphology-based radiographic hip osteoarthritis (RHOA) risk prediction models and investigates the added predictive value of hip morphology measurements and the generalizability to different populations.

Methods
We combined data from nine prospective cohort studies participating in the Worldwide Collaboration on OsteoArthritis prediCtion for the Hip (World COACH) consortium. RHOA grades were harmonized, and incident RHOA was defined as hips without definite RHOA at baseline that developed definite RHOA within four to eight years. Baseline hip morphology was quantified with automatically and uniformly determined lateral center edge angle and alpha angle measurements on anteroposterior radiographs. Discriminative performance of generalized linear mixed model (GLMM) definitions with and without hip morphology measurements was determined with stratified cross-validation. With leave-one-cohort-out cross-validation, the generalizability to unseen populations of hip morphology–based GLMMs and random forest (RF) models was evaluated.

Results
From the included 35,984 hips without definite RHOA at baseline, 4.7% developed incident RHOA within four to eight years. The GLMM with cohort-specific intercept, considering baseline demographics, RHOA grade, and hip morphology measurements, showed a mean area under the receiver operating characteristic curve (AUC) of 0.80 (±0.01) in stratified cross-validation. Using a marginal intercept decreased performance by 0.1 in AUC. Similar results were found for a GLMM without hip morphology measurements. Leave-one-cohort-out cross-validation showed comparable discrimination (AUC between 0.56–0.88) and calibration performance for hip morphology-based GLMMs and RF models.

Conclusion
In hips free of definite RHOA, our AUCs for the incident RHOA models showed good predictive performance in similar populations. However, the added predictive value of the morphology measurements was small, and model performance was heterogeneous in leave-one-cohort-out cross-validation. ...
Journal article (2026) - Jim M. Smit, Jesse H. Krijthe, Melanie Lloyd, Antoni Torres, Pierre François Dequin, Philip A. van der Zee
Journal article (2026) - M. A. van den Berg, E. Panfilov, S. M.A. Bierma-Zeinstra, J. H. Krijthe, R. Agricola, A. Tiulpin
Objective: Osteoarthritis (OA) is typically studied in isolated joints, but humans are interconnected systems. This raises the question of how multi-joint OA manifests, and whether it forms a distinct subgroup. This study aimed to investigate whether individuals with OA worsening in both the hip and the knee exhibit unique clinical, structural, or demographic characteristics compared to those with isolated OA worsening or no worsening. Design: We conducted a retrospective analysis using data from the Osteoarthritis Initiative, including 1958 participants with radiographic assessments of hip and knee joints at baseline and 48-month follow-up. Participants were categorized into four groups based on joint space narrowing: no worsening, hip-only worsening, knee-only worsening, or combined worsening in 48 months. Univariate comparisons and multivariate logistic regression analyses were performed to compare the combined worsening group to the other groups. Results: Combined worsening occurred in 12.5% of participants. Compared to those with no worsening, the combined worsening group had more severe baseline radiographic knee OA (aOR: 1.38 (1.15–1.64)). Compared to hip-only OA worsening, the combined group had more severe knee OA (aOR: 1.36 (1.11–1.67)). Compared to those with knee-only OA worsening, combined OA worsening was associated with female sex (aOR: 1.92 (1.31–2.76)). Conclusions: Our findings show differences between individuals with combined or isolated OA worsening, which may reflect accumulation of single-joint risk factors rather than a distinct trajectory. This research provides a foundation for large-scale investigations into multi-joint OA subtypes to improve patient stratification and inform targeted interventions. ...

Causal Blind Spots When Using Prediction Models for Treatment Decisions

Journal article (2025) - Nan van Geloven, Ruth H. Keogh, Wouter van Amsterdam, Giovanni Cinà, Jesse H. Krijthe, Niels Peek, Kim Luijken, Sara Magliacane, Paweł Morzywołek, More authors...
Clinicians increasingly rely on prediction models to guide treatment choices. Most prediction models, however, are developed using observational data that include some patients who have already received the treatment the prediction model is meant to inform. Special attention to the causal role of those earlier treatments is required when interpreting the resulting predictions.

“Causal blind spots” were identified in 3 common approaches to handling treatment when developing a prediction model: including treatment as a predictor, restricting to persons taking a certain treatment, and ignoring treatment. Through several real examples, this article illustrates how the risks obtained from models developed using such approaches may be misinterpreted and can lead to misinformed decision making. The discussion covers issues attributable to confounding, selection, mediation, and changes in treatment protocols over time.

An extension of guidelines for the development, reporting, and evaluation of prediction models is advocated to avoid such misinterpretations. Developers must ensure that the intended target population for the model, and the treatment conditions under which predictions hold, are clearly communicated. When prediction models are intended to inform treatment decisions, they need to provide estimates of risk under the specific treatment (or intervention) options being considered, known as “prediction under interventions.” Next to suitable data, this requires causal reasoning and causal inference techniques during model development and evaluation. Being clear about what a given prediction model can and cannot be used for prevents misinformed treatment decisions and thereby prevents potential harm to patients. ...
Journal article (2025) - Thomas F. Kok, Navin Suthahar, Jesse H. Krijthe, Rudolf A. De Boer, Eric Boersma, Isabella Kardys
Aims We aimed to compare performances of conventional survival models with machine learning (ML) survival models for incident heart failure (HF) in men and women without prevalent HF, cardiomyopathy (CM) or ischaemic heart disease (IHD), and to identify potential high-risk precursors overlooked by conventional survival models. Methods and results We predicted 10-year risk of incident HF in 266 306 women (2894 events) and 212 061 men (4213 events). We constructed multivariable Cox models, first using ∼ 400 baseline characteristics, and subsequently only those remaining after LASSO stability selection. We also used Random Survival Forest (RSF) and eXtreme Gradient Survival Boosting (XGBoost). Performances were assessed using internal cross validation and hold-out sets, with C-indices, calibration curves and net-benefit analyses. Model performances were comparable during internal validation: XGBoost (C-index ± SE) (men: 0.79 ± 0.0040, women: 0.83 ± 0.0023) showed similar performance to the multivariable Cox model (men: 0.80 ± 0.0031, women: 0.83 ± 0.0022) and Cox models after LASSO stability selection, while RSF showed numerically slightly lower performance (men: 0.78 ± 0.0025, women: 0.81 ± 0.0015). Findings were similar in the hold-out sets. Age, cystatin-C, lifetime treatments/medications, other heart disease, systolic blood pressure, and spirometry measures were identified as high-risk factors in both model types for both sexes. Additionally, sex-specific and model-specific risk factors were identified. Conclusion Machine learning models and Cox proportional hazard models performed well and similarly for 10-year incident HF risk prediction in the general population. However, sex-specific and model-specific risk predictors were found. Spirometry measures, rarely included in existing models, were identified as important risk factors. Our results suggest that ML models for HF prediction in the general population reveal insights that would otherwise remain unnoticed. ...

A multi-center retrospective study (SWITCH)

Journal article (2025) - Jim M. Smit, Jasper Van Bommel, Diederik A.M.P.J. Gommers, Marcel J.T. Reinders, Michel E. Van Genderen, Jesse H. Krijthe, Annemijn H. Jonkman
Background
Switching from controlled to assisted ventilation is crucial in the trajectory of intensive care unit (ICU) stay, but no guidelines exist. We described current practices, analyzed patient characteristics associated with switch success or failure, and explored the feasibility to predict switch failure.

Methods
In this retrospective study, we obtained highly granular longitudinal ICU data sets from three medical centers, covering demographics, severity scores, vital signs, ventilation, and laboratory parameters. The primary endpoint was switch success, considering a switch attempt to be successful if a patient did not return to controlled ventilation for the next 72 h while alive, and to be failed otherwise. We compared the characteristics of patients with successful vs. failed first switch attempts at ICU admission, immediately before, and 3 h after the attempt. We trained LASSO logistic regression models to predict switch failure.

Results
In 4524/6715 (67%) patients attempting a switch, the first attempt failed. The first switch attempt, regardless of success or failure, was generally made at normalized PaCO2 and pH levels, with PEEP < 10 cmH2O and PaO2/FiO2 indicating mild injury. Despite very similar baseline disease severity, switch failure was associated with significantly worse outcomes, including a 28-day mortality of 27% vs. 16% and median ventilator-free days of 16 vs. 22 (p < 0.001). Failed attempts were initiated significantly earlier than successful ones (median 1.8 vs. 1.3 days, p < 0.001). Before the switch, PaO2/FiO2, if measured at PEEP > 10 cmH2O, and respiratory system compliance was lower in patients with switch failure (median 185 vs. 205 mmHg, p < 0.001; 39 vs. 41 mL/cmH2O, P = 0.001), and post-switch, patients with switch failure experienced greater deterioration in gas exchange and minimal improvement in ventilatory parameters post-switch. Contrary to our hypotheses, patient characteristics for failed vs. successful switches were surprisingly similar, resulting in prediction models with limited discriminative performance.

Conclusions
Approximately two-thirds of attempts to switch patients to assisted ventilation fail, which are associated with significantly worse clinical outcomes, despite similar baseline disease severity. Contrary to our hypotheses, patients with successful and failed attempts showed similar characteristics, making switch failure difficult to predict. These findings underscore the importance of preventing switch failures and, given the retrospective nature of this study, highlight the need for prospective studies to better understand the reasons for switch failure and when spontaneous breathing can be safely initiated. ...
Journal article (2025) - Jim M. Smit, Philip A. Van Der Zee, Dominic Snijders, Wim G. Boersma, Paola Confalonieri, Francesco Salton, Diederik A.M.P.J. Gommers, Marcel J.T. Reinders, Jesse H. Krijthe, More Authors...
Background: Despite several randomised controlled trials (RCTs) on the use of adjuvant treatment with corticosteroids in patients with community-acquired pneumonia (CAP), the effect of this intervention on mortality remains controversial. We aimed to evaluate heterogeneity of treatment effect (HTE) of adjuvant treatment with corticosteroids on 30-day mortality in patients with CAP. Methods: In this individual patient data meta-analysis, we included RCTs published before July 1, 2024, comparing adjuvant treatment with corticosteroids versus placebo in patients hospitalised with CAP. The primary endpoint was 30-day all-cause mortality, collected across all trials, and analyses followed the intention-to-treat principle. We analysed HTE using risk and effect modelling. For risk modelling, patients were classified as having less severe or severe CAP based on the pneumonia severity index (PSI), comparing PSI class I–III versus class IV–V. For effect modelling, we trained a corticosteroid-effect model on six trials and externally validated it using data from two trials, received after model preregistration. This model classified patients into two groups: no predicted benefit and predicted benefit from adjuvant treatment with corticosteroids. The literature search was registered on PROSPERO, CRD42022380746. Findings: We included eight RCTs with 3224 patients. Across all eight trials, 246 (7·6%) patients died within 30 days (106 [6·6%] of 1618 in the corticosteroid group vs 140 [8·7%] of 1606 in the placebo group; odds ratio [OR] 0·72 [95% CI 0·56–0·94], p=0·017). The corticosteroid-effect model, which selected C-reactive protein (CRP), showed significant HTE during external validation in the two most recent trials. In these trials, 154 (11·4%) of 1355 patients died within 30 days (88 [13·1%] of 671 in the placebo group vs 66 [9·6%] of 684 in the corticosteroid group; OR 0·71 [95% CI 0·50–0·99], p=0·044). Among patients predicted to have no benefit (CRP ≤204 mg/L, n=725), no significant effect was observed (OR 0·98 [95% CI 0·63–1·50]), whereas for those with predicted benefit (CRP >204 mg/L, n=630), 39 (13·0%) of 301 patients died in the placebo group compared with 20 (6·1%) of 329 in the corticosteroid group (0·43 [0·25–0·76], pinteraction=0·026). No significant HTE was found between less severe CAP (PSI class I–III, n=229) and severe CAP (PSI class IV–V, n=1126). Corticosteroid therapy significantly increased hyperglycaemia risk (44 [12·8%] of 344 in the placebo group vs 84 [24·8%] of 339 in the corticosteroid group; OR 2·50 [95% CI 1·63–3·83], p<0·0001) and hospital re-admission risk (30 [3·7%] of 814 in the placebo group vs 57 [7·0%] of 819 in the corticosteroid group; 1·95 [1·24–3·07], p=0·0038). Interpretation: Overall, adjuvant therapy with corticosteroids significantly reduces 30-day mortality in patients hospitalised with CAP. The treatment effect varied significantly among subgroups based on CRP concentrations, with a substantial mortality reduction observed only in patients with high baseline CRP. Funding: None. ...
Journal article (2025) - Maurice N. Korf, Nan Van Geloven, Jesse H. Krijthe, Jeremy A. Labrecque

Charting a personalised approach – Authors’ reply

Journal article (2025) - Jim M. Smit, Jesse H. Krijthe, Gianfranco U. Meduri, Pierre François Dequin, Harin Karunajeewa, Antoni Torres, Marcel J.T. Reinders, Henrik Endeman, Philip A. Van Der Zee
We appreciate the opportunity to further clarify our findings in response to the insightful comments from Shota Yamamoto and colleagues and Luis Felipe Reyes and Ignacio Martin-Loeches regarding our recent community-acquired pneumonia (CAP) study. [...] ...
Journal article (2025) - Maaike R. Schagen, Alvaro Assis de Souza, Karin Boer, Jesse H. Krijthe, Rachida Bouamar, Andrew P. Stubbs, Dennis A. Hesselink, Brenda C.M. de Winter
Background: – Reports regarding the relationship between tacrolimus exposure and the risk of acute kidney allograft rejection are conflicting. This may be explained by the previous use of methodological approaches that disregarded important factors in the analysis of longitudinal measurements and time-to-event data. Therefore, in this study, joint models were used to investigate the relationship between repeated measurements of tacrolimus predose concentrations (C 0) and time to acute biopsy-proven acute rejection (BPAR).Methods: – This was a post hoc analysis of a randomized controlled trial in which living-donor kidney transplant recipients (KTR) received either a standard, bodyweight-based or CYP3A5 genotype-based tacrolimus starting dose. Joint modeling was performed by coupling a mixed-effects model for tacrolimus C 0 with a Cox proportional hazards model for the risk of rejection. Only the first episode of rejection was considered.Results: – A total of 229 KTRs were included, of whom the incidence of BPAR was 10.5% (n = 24 KTRs) in the first 3 months posttransplant. A total of 3069 tacrolimus measurements were available for the analysis. A joint model adjusted for recipient age and peak panel reactive antibodies demonstrated that tacrolimus C 0 was associated with risk of rejection. A 1-unit increase in the time-normalized area under the curve for logarithmically (log)-transformed C 0 represented a change of −2.65 in the log of the relative hazard (95% credible interval: −5.05 to −0.36, P = 0.022).Conclusions: – A negative association between the cumulative effect of tacrolimus C 0 and BPAR was observed using joint modeling. This demonstrated that KTRs with lower tacrolimus exposure were at a higher risk of rejection. ...
Journal article (2025) - Rickard K.A. Karlsson, Jesse H. Krijthe
A major challenge in estimating treatment effects in observational studies is the reliance on untestable conditions such as the assumption of no unmeasured confounding. In this work, we propose an algorithm that can falsify the assumption of no unmeasured confounding in a setting with observational data from multiple heterogeneous sources, which we refer to as environments. Our proposed falsification strategy leverages a key observation that unmeasured confounding can cause observed causal mechanisms to appear dependent. Building on this observation, we develop a novel two-stage procedure that detects these dependencies with high statistical power while controlling false positives. The algorithm does not require access to randomized data and, in contrast to other falsification approaches, functions even under transportability violations when the environment has a direct effect on the outcome of interest. To showcase the practical relevance of our approach, we show that our method is able to efficiently detect confounding on both simulated and semi-synthetic data. ...
Conference paper (2025) - Ilinca Rențea, Gosia Migut, Jesse Krijthe
With the fast integration of Machine Learning (ML) across industries, effective pedagogical strategies are essential for teaching this complex and evolving field. Machine Learning is now widely integrated into various university programs and introduced at earlier educational stages, including high school and secondary school. However, ML pedagogy lacks standardized teaching methods compared to other science-related subjects, which have established norms for topic introduction, teaching tools, and assessment methods. Inspired by other fields, this research explores the use of interactive visualizations in teaching ML topics, more specifically in teaching Gradient Descent (GD) and Principal Component Analysis (PCA). The target population consists of Computer Science and Engineering Bachelor students who have not yet followed any Machine Learning courses but have foundational knowledge in calculus, linear algebra, and statistics. The evaluation measures knowledge gained and student motivation, compared to a static version of the materials. Results show a significant positive effect in knowledge related to PCA with interactive visualizations, but no differences in knowledge gain for GD or in learning motivation for either topic. With these results, we contribute to the body of evidence-based teaching methods in Machine Learning and identify further research needed to generalize the effect of interactive visualizations as a teaching method for teaching ML basic concepts. ...

A valuable strategy or methodologically fragile path?

Journal article (2025) - Jim M. Smit, Annemijn H. Jonkman, Jesse H. Krijthe
In her pioneering work, Calfee et al. [1] addressed the clinical and biological heterogeneity of acute respiratory distress syndrome (ARDS), a factor likely contributing to the poor track record of randomized trials (RCTs) in this patient population. Using latent class (or profile) analysis (LCA), a method for identifying unobserved subgroups from observed data, they identified two distinct ARDS sub-phenotypes (hypo- and hyperinflammatory), which showed association with clinical outcomes and, crucially, heterogeneity of treatment effect (HTE) [2], demonstrating different responses to higher vs. lower PEEP regimes. [...] ...
Journal article (2025) - Luc J.W. Evers, Yordan P. Raykov, Tom M. Heskes, Jesse H. Krijthe, Bastiaan R. Bloem, Max A. Little
Objective and continuous monitoring of Parkinson’s disease (PD) tremor in free-living conditions could benefit both individual patient care and clinical trials, by overcoming the snapshot nature of clinical assessments. To enable robust detection of tremor in the context of limited amounts of labeled training data, we propose to use prototypical networks, which can embed domain expertise about the heterogeneous tremor and non-tremor sub-classes. We evaluated our approach using data from the Parkinson@Home Validation study, including 8 PD patients with tremor, 16 PD patients without tremor, and 24 age-matched controls. We used wrist accelerometer data and synchronous expert video annotations for the presence of tremor, captured during unscripted daily life activities in and around the participants’ own homes. Based on leave-one-subject-out cross-validation, we demonstrate the ability of prototypical networks to capture free-living tremor episodes. Specifically, we demonstrate that prototypical networks can be used to enforce robust performance across domain-informed sub-classes, including different tremor phenotypes and daily life activities. ...
Journal article (2025) - Wieske K. de Swart, Marco Loog, Jesse H. Krijthe
Introduction: Dynamic survival analysis has become an effective approach for predicting time-to-event outcomes based on longitudinal data in neurology, cognitive health, and other health-related domains. With advancements in machine learning, several new methods have been introduced, often using a two-stage approach: first extracting features from longitudinal trajectories and then using these to predict survival probabilities.

Methods: This work compares several combinations of longitudinal and survival models, assessing their predictive performance across different training strategies. Using synthetic and real-world cognitive health data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), we explore the strengths and limitations of each model.

Results: Among the considered survival models, the Random Survival Forest consistently delivered strong results across different datasets, longitudinal models, and training strategies. On the ADNI dataset the best performing method was Random Survival Forest with the last visit benchmark and super landmarking with an average tdAUC of 0.96 and brier score of 0.07. Several other methods, including Cox Proportional Hazards and the Recurrent Neural Network, achieve similar scores. While the tested longitudinal models often struggled to outperform simple benchmarks, neural network models showed some improvement in simulated scenarios with sufficiently informative longitudinal trajectories.

Discussion: Our findings underscore the importance of aligning model selection and training strategies with the specific characteristics of the data and the target application, providing valuable insights that can inform future developments in dynamic survival analysis. ...
Journal article (2025) - Wouter A.C. van Amsterdam, Nan van Geloven, Jesse H. Krijthe, Rajesh Ranganath, Giovanni Cinà
Prediction models are popular in medical research and practice. Many expect that by predicting patient-specific outcomes, these models have the potential to inform treatment decisions, and they are frequently lauded as instruments for personalized, data-driven healthcare. We show, however, that using prediction models for decision-making can lead to harm, even when the predictions exhibit good discrimination after deployment. These models are harmful self-fulfilling prophecies: their deployment harms a group of patients, but the worse outcome of these patients does not diminish the discrimination of the model. Our main result is a formal characterization of a set of such prediction models. Next, we show that models that are well calibrated before and after deployment are useless for decision-making, as they make no change in the data distribution. These results call for a reconsideration of standard practices for validation and deployment of prediction models that are used in medical decisions. ...

Mind the interaction with PEEP!

Journal article (2025) - J. M. Smit, J. H. Krijthe, J. Van Bommel, M. E. Van Genderen, M. J.T. Reinders, A. H. Jonkman

Estimands for Sequential Prediction Under Interventions

Journal article (2024) - Kim Luijken, Paweł Morzywołek, Wouter van Amsterdam, Giovanni Cinà, Jeroen Hoogland, Ruth Keogh, Jesse H. Krijthe, Sara Magliacane, Nan van Geloven, More Authors...
Prediction models are used among others to inform medical decisions on interventions. Typically, individuals with high risks of adverse outcomes are advised to undergo an intervention while those at low risk are advised to refrain from it. Standard prediction models do not always provide risks that are relevant to inform such decisions: for example, an individual may be estimated to be at low risk because similar individuals in the past received an intervention which lowered their risk. Therefore, prediction models supporting decisions should target risks belonging to defined intervention strategies. Previous works on prediction under interventions assumed that the prediction model was used only at one time point to make an intervention decision. In clinical practice, intervention decisions are rarely made only once: they might be repeated, deferred, and reevaluated. This requires estimated risks under interventions that can be reconsidered at several potential decision moments. In the current work, we highlight key considerations for formulating estimands in sequential prediction under interventions that can inform such intervention decisions. We illustrate these considerations by giving examples of estimands for a case study about choosing between vaginal delivery and cesarean section for women giving birth. Our formalization of prediction tasks in a sequential, causal, and estimand context provides guidance for future studies to ensure that the right question is answered and appropriate causal estimation approaches are chosen to develop sequential prediction models that can inform intervention decisions. ...