J.H. Krijthe
Please Note
55 records found
1
Applications and implicit assumptions in dementia risk scores
A scoping review of the LIBRA score
Dementia risk scores are commonly used tools to estimate the risk of developing Alzheimer's disease and dementia. We lack an overview of what risk scores are used for, what is claimed they ought to be used for, and whether they are suitable for these applications. To address this, we use the ‘Lifestyle for Brain Health’ (LIBRA) score as a representative example risk score and conduct a literature review to study its applications. The goals of this study are (1) to create an overview of how the LIBRA score has been utilized in scientific articles, (2) to record other applications that these same articles mention, and (3) to critically assess whether LIBRA is suitable for these applications. Of the 66 articles included in our review, 36 involved analyzing associations of LIBRA with dementia, cognition, or other outcomes. We also identified several other applications, with 32 articles mentioning LIBRA as an estimate of ‘dementia prevention potential’, 6 articles used LIBRA as a surrogate outcome for their trial or intervention, and 7 articles mentioned that it could help support clinician decisions in practice. Although there is a clear need for tools that can be used for these applications, the amount of evidence supporting the suitability of dementia risk scores for many of these applications is limited. We recommend that researchers transparently report the purposes of these dementia risk scores, which may include causal tasks, and that research is done to evaluate whether it is valid to use these scores in this way.
Hip Morphology–Based Osteoarthritis Risk Prediction Models
Development and External Validation Using Individual Participant Data From the World COACH Consortium
This study aims to develop hip morphology-based radiographic hip osteoarthritis (RHOA) risk prediction models and investigates the added predictive value of hip morphology measurements and the generalizability to different populations.
Methods
We combined data from nine prospective cohort studies participating in the Worldwide Collaboration on OsteoArthritis prediCtion for the Hip (World COACH) consortium. RHOA grades were harmonized, and incident RHOA was defined as hips without definite RHOA at baseline that developed definite RHOA within four to eight years. Baseline hip morphology was quantified with automatically and uniformly determined lateral center edge angle and alpha angle measurements on anteroposterior radiographs. Discriminative performance of generalized linear mixed model (GLMM) definitions with and without hip morphology measurements was determined with stratified cross-validation. With leave-one-cohort-out cross-validation, the generalizability to unseen populations of hip morphology–based GLMMs and random forest (RF) models was evaluated.
Results
From the included 35,984 hips without definite RHOA at baseline, 4.7% developed incident RHOA within four to eight years. The GLMM with cohort-specific intercept, considering baseline demographics, RHOA grade, and hip morphology measurements, showed a mean area under the receiver operating characteristic curve (AUC) of 0.80 (±0.01) in stratified cross-validation. Using a marginal intercept decreased performance by 0.1 in AUC. Similar results were found for a GLMM without hip morphology measurements. Leave-one-cohort-out cross-validation showed comparable discrimination (AUC between 0.56–0.88) and calibration performance for hip morphology-based GLMMs and RF models.
Conclusion
In hips free of definite RHOA, our AUCs for the incident RHOA models showed good predictive performance in similar populations. However, the added predictive value of the morphology measurements was small, and model performance was heterogeneous in leave-one-cohort-out cross-validation. ...
This study aims to develop hip morphology-based radiographic hip osteoarthritis (RHOA) risk prediction models and investigates the added predictive value of hip morphology measurements and the generalizability to different populations.
Methods
We combined data from nine prospective cohort studies participating in the Worldwide Collaboration on OsteoArthritis prediCtion for the Hip (World COACH) consortium. RHOA grades were harmonized, and incident RHOA was defined as hips without definite RHOA at baseline that developed definite RHOA within four to eight years. Baseline hip morphology was quantified with automatically and uniformly determined lateral center edge angle and alpha angle measurements on anteroposterior radiographs. Discriminative performance of generalized linear mixed model (GLMM) definitions with and without hip morphology measurements was determined with stratified cross-validation. With leave-one-cohort-out cross-validation, the generalizability to unseen populations of hip morphology–based GLMMs and random forest (RF) models was evaluated.
Results
From the included 35,984 hips without definite RHOA at baseline, 4.7% developed incident RHOA within four to eight years. The GLMM with cohort-specific intercept, considering baseline demographics, RHOA grade, and hip morphology measurements, showed a mean area under the receiver operating characteristic curve (AUC) of 0.80 (±0.01) in stratified cross-validation. Using a marginal intercept decreased performance by 0.1 in AUC. Similar results were found for a GLMM without hip morphology measurements. Leave-one-cohort-out cross-validation showed comparable discrimination (AUC between 0.56–0.88) and calibration performance for hip morphology-based GLMMs and RF models.
Conclusion
In hips free of definite RHOA, our AUCs for the incident RHOA models showed good predictive performance in similar populations. However, the added predictive value of the morphology measurements was small, and model performance was heterogeneous in leave-one-cohort-out cross-validation.
Objective: Osteoarthritis (OA) is typically studied in isolated joints, but humans are interconnected systems. This raises the question of how multi-joint OA manifests, and whether it forms a distinct subgroup. This study aimed to investigate whether individuals with OA worsening in both the hip and the knee exhibit unique clinical, structural, or demographic characteristics compared to those with isolated OA worsening or no worsening. Design: We conducted a retrospective analysis using data from the Osteoarthritis Initiative, including 1958 participants with radiographic assessments of hip and knee joints at baseline and 48-month follow-up. Participants were categorized into four groups based on joint space narrowing: no worsening, hip-only worsening, knee-only worsening, or combined worsening in 48 months. Univariate comparisons and multivariate logistic regression analyses were performed to compare the combined worsening group to the other groups. Results: Combined worsening occurred in 12.5% of participants. Compared to those with no worsening, the combined worsening group had more severe baseline radiographic knee OA (aOR: 1.38 (1.15–1.64)). Compared to hip-only OA worsening, the combined group had more severe knee OA (aOR: 1.36 (1.11–1.67)). Compared to those with knee-only OA worsening, combined OA worsening was associated with female sex (aOR: 1.92 (1.31–2.76)). Conclusions: Our findings show differences between individuals with combined or isolated OA worsening, which may reflect accumulation of single-joint risk factors rather than a distinct trajectory. This research provides a foundation for large-scale investigations into multi-joint OA subtypes to improve patient stratification and inform targeted interventions.
The Risks of Risk Assessment
Causal Blind Spots When Using Prediction Models for Treatment Decisions
“Causal blind spots” were identified in 3 common approaches to handling treatment when developing a prediction model: including treatment as a predictor, restricting to persons taking a certain treatment, and ignoring treatment. Through several real examples, this article illustrates how the risks obtained from models developed using such approaches may be misinterpreted and can lead to misinformed decision making. The discussion covers issues attributable to confounding, selection, mediation, and changes in treatment protocols over time.
An extension of guidelines for the development, reporting, and evaluation of prediction models is advocated to avoid such misinterpretations. Developers must ensure that the intended target population for the model, and the treatment conditions under which predictions hold, are clearly communicated. When prediction models are intended to inform treatment decisions, they need to provide estimates of risk under the specific treatment (or intervention) options being considered, known as “prediction under interventions.” Next to suitable data, this requires causal reasoning and causal inference techniques during model development and evaluation. Being clear about what a given prediction model can and cannot be used for prevents misinformed treatment decisions and thereby prevents potential harm to patients. ...
“Causal blind spots” were identified in 3 common approaches to handling treatment when developing a prediction model: including treatment as a predictor, restricting to persons taking a certain treatment, and ignoring treatment. Through several real examples, this article illustrates how the risks obtained from models developed using such approaches may be misinterpreted and can lead to misinformed decision making. The discussion covers issues attributable to confounding, selection, mediation, and changes in treatment protocols over time.
An extension of guidelines for the development, reporting, and evaluation of prediction models is advocated to avoid such misinterpretations. Developers must ensure that the intended target population for the model, and the treatment conditions under which predictions hold, are clearly communicated. When prediction models are intended to inform treatment decisions, they need to provide estimates of risk under the specific treatment (or intervention) options being considered, known as “prediction under interventions.” Next to suitable data, this requires causal reasoning and causal inference techniques during model development and evaluation. Being clear about what a given prediction model can and cannot be used for prevents misinformed treatment decisions and thereby prevents potential harm to patients.
Aims We aimed to compare performances of conventional survival models with machine learning (ML) survival models for incident heart failure (HF) in men and women without prevalent HF, cardiomyopathy (CM) or ischaemic heart disease (IHD), and to identify potential high-risk precursors overlooked by conventional survival models. Methods and results We predicted 10-year risk of incident HF in 266 306 women (2894 events) and 212 061 men (4213 events). We constructed multivariable Cox models, first using ∼ 400 baseline characteristics, and subsequently only those remaining after LASSO stability selection. We also used Random Survival Forest (RSF) and eXtreme Gradient Survival Boosting (XGBoost). Performances were assessed using internal cross validation and hold-out sets, with C-indices, calibration curves and net-benefit analyses. Model performances were comparable during internal validation: XGBoost (C-index ± SE) (men: 0.79 ± 0.0040, women: 0.83 ± 0.0023) showed similar performance to the multivariable Cox model (men: 0.80 ± 0.0031, women: 0.83 ± 0.0022) and Cox models after LASSO stability selection, while RSF showed numerically slightly lower performance (men: 0.78 ± 0.0025, women: 0.81 ± 0.0015). Findings were similar in the hold-out sets. Age, cystatin-C, lifetime treatments/medications, other heart disease, systolic blood pressure, and spirometry measures were identified as high-risk factors in both model types for both sexes. Additionally, sex-specific and model-specific risk factors were identified. Conclusion Machine learning models and Cox proportional hazard models performed well and similarly for 10-year incident HF risk prediction in the general population. However, sex-specific and model-specific risk predictors were found. Spirometry measures, rarely included in existing models, were identified as important risk factors. Our results suggest that ML models for HF prediction in the general population reveal insights that would otherwise remain unnoticed.
Switching from controlled to assisted mechanical ventilation
A multi-center retrospective study (SWITCH)
Switching from controlled to assisted ventilation is crucial in the trajectory of intensive care unit (ICU) stay, but no guidelines exist. We described current practices, analyzed patient characteristics associated with switch success or failure, and explored the feasibility to predict switch failure.
Methods
In this retrospective study, we obtained highly granular longitudinal ICU data sets from three medical centers, covering demographics, severity scores, vital signs, ventilation, and laboratory parameters. The primary endpoint was switch success, considering a switch attempt to be successful if a patient did not return to controlled ventilation for the next 72 h while alive, and to be failed otherwise. We compared the characteristics of patients with successful vs. failed first switch attempts at ICU admission, immediately before, and 3 h after the attempt. We trained LASSO logistic regression models to predict switch failure.
Results
In 4524/6715 (67%) patients attempting a switch, the first attempt failed. The first switch attempt, regardless of success or failure, was generally made at normalized PaCO2 and pH levels, with PEEP < 10 cmH2O and PaO2/FiO2 indicating mild injury. Despite very similar baseline disease severity, switch failure was associated with significantly worse outcomes, including a 28-day mortality of 27% vs. 16% and median ventilator-free days of 16 vs. 22 (p < 0.001). Failed attempts were initiated significantly earlier than successful ones (median 1.8 vs. 1.3 days, p < 0.001). Before the switch, PaO2/FiO2, if measured at PEEP > 10 cmH2O, and respiratory system compliance was lower in patients with switch failure (median 185 vs. 205 mmHg, p < 0.001; 39 vs. 41 mL/cmH2O, P = 0.001), and post-switch, patients with switch failure experienced greater deterioration in gas exchange and minimal improvement in ventilatory parameters post-switch. Contrary to our hypotheses, patient characteristics for failed vs. successful switches were surprisingly similar, resulting in prediction models with limited discriminative performance.
Conclusions
Approximately two-thirds of attempts to switch patients to assisted ventilation fail, which are associated with significantly worse clinical outcomes, despite similar baseline disease severity. Contrary to our hypotheses, patients with successful and failed attempts showed similar characteristics, making switch failure difficult to predict. These findings underscore the importance of preventing switch failures and, given the retrospective nature of this study, highlight the need for prospective studies to better understand the reasons for switch failure and when spontaneous breathing can be safely initiated. ...
Switching from controlled to assisted ventilation is crucial in the trajectory of intensive care unit (ICU) stay, but no guidelines exist. We described current practices, analyzed patient characteristics associated with switch success or failure, and explored the feasibility to predict switch failure.
Methods
In this retrospective study, we obtained highly granular longitudinal ICU data sets from three medical centers, covering demographics, severity scores, vital signs, ventilation, and laboratory parameters. The primary endpoint was switch success, considering a switch attempt to be successful if a patient did not return to controlled ventilation for the next 72 h while alive, and to be failed otherwise. We compared the characteristics of patients with successful vs. failed first switch attempts at ICU admission, immediately before, and 3 h after the attempt. We trained LASSO logistic regression models to predict switch failure.
Results
In 4524/6715 (67%) patients attempting a switch, the first attempt failed. The first switch attempt, regardless of success or failure, was generally made at normalized PaCO2 and pH levels, with PEEP < 10 cmH2O and PaO2/FiO2 indicating mild injury. Despite very similar baseline disease severity, switch failure was associated with significantly worse outcomes, including a 28-day mortality of 27% vs. 16% and median ventilator-free days of 16 vs. 22 (p < 0.001). Failed attempts were initiated significantly earlier than successful ones (median 1.8 vs. 1.3 days, p < 0.001). Before the switch, PaO2/FiO2, if measured at PEEP > 10 cmH2O, and respiratory system compliance was lower in patients with switch failure (median 185 vs. 205 mmHg, p < 0.001; 39 vs. 41 mL/cmH2O, P = 0.001), and post-switch, patients with switch failure experienced greater deterioration in gas exchange and minimal improvement in ventilatory parameters post-switch. Contrary to our hypotheses, patient characteristics for failed vs. successful switches were surprisingly similar, resulting in prediction models with limited discriminative performance.
Conclusions
Approximately two-thirds of attempts to switch patients to assisted ventilation fail, which are associated with significantly worse clinical outcomes, despite similar baseline disease severity. Contrary to our hypotheses, patients with successful and failed attempts showed similar characteristics, making switch failure difficult to predict. These findings underscore the importance of preventing switch failures and, given the retrospective nature of this study, highlight the need for prospective studies to better understand the reasons for switch failure and when spontaneous breathing can be safely initiated.
Predicting benefit from adjuvant therapy with corticosteroids in community-acquired pneumonia
A data-driven analysis of randomised trials
Background: Despite several randomised controlled trials (RCTs) on the use of adjuvant treatment with corticosteroids in patients with community-acquired pneumonia (CAP), the effect of this intervention on mortality remains controversial. We aimed to evaluate heterogeneity of treatment effect (HTE) of adjuvant treatment with corticosteroids on 30-day mortality in patients with CAP. Methods: In this individual patient data meta-analysis, we included RCTs published before July 1, 2024, comparing adjuvant treatment with corticosteroids versus placebo in patients hospitalised with CAP. The primary endpoint was 30-day all-cause mortality, collected across all trials, and analyses followed the intention-to-treat principle. We analysed HTE using risk and effect modelling. For risk modelling, patients were classified as having less severe or severe CAP based on the pneumonia severity index (PSI), comparing PSI class I–III versus class IV–V. For effect modelling, we trained a corticosteroid-effect model on six trials and externally validated it using data from two trials, received after model preregistration. This model classified patients into two groups: no predicted benefit and predicted benefit from adjuvant treatment with corticosteroids. The literature search was registered on PROSPERO, CRD42022380746. Findings: We included eight RCTs with 3224 patients. Across all eight trials, 246 (7·6%) patients died within 30 days (106 [6·6%] of 1618 in the corticosteroid group vs 140 [8·7%] of 1606 in the placebo group; odds ratio [OR] 0·72 [95% CI 0·56–0·94], p=0·017). The corticosteroid-effect model, which selected C-reactive protein (CRP), showed significant HTE during external validation in the two most recent trials. In these trials, 154 (11·4%) of 1355 patients died within 30 days (88 [13·1%] of 671 in the placebo group vs 66 [9·6%] of 684 in the corticosteroid group; OR 0·71 [95% CI 0·50–0·99], p=0·044). Among patients predicted to have no benefit (CRP ≤204 mg/L, n=725), no significant effect was observed (OR 0·98 [95% CI 0·63–1·50]), whereas for those with predicted benefit (CRP >204 mg/L, n=630), 39 (13·0%) of 301 patients died in the placebo group compared with 20 (6·1%) of 329 in the corticosteroid group (0·43 [0·25–0·76], pinteraction=0·026). No significant HTE was found between less severe CAP (PSI class I–III, n=229) and severe CAP (PSI class IV–V, n=1126). Corticosteroid therapy significantly increased hyperglycaemia risk (44 [12·8%] of 344 in the placebo group vs 84 [24·8%] of 339 in the corticosteroid group; OR 2·50 [95% CI 1·63–3·83], p<0·0001) and hospital re-admission risk (30 [3·7%] of 814 in the placebo group vs 57 [7·0%] of 819 in the corticosteroid group; 1·95 [1·24–3·07], p=0·0038). Interpretation: Overall, adjuvant therapy with corticosteroids significantly reduces 30-day mortality in patients hospitalised with CAP. The treatment effect varied significantly among subgroups based on CRP concentrations, with a substantial mortality reduction observed only in patients with high baseline CRP. Funding: None.
C-reactive protein-guided treatment in pneumonia
Charting a personalised approach – Authors’ reply
Tacrolimus Exposure is Associated with Acute Rejection in the Early Phase After Kidney Transplantation
A Joint Modeling Approach
Background: – Reports regarding the relationship between tacrolimus exposure and the risk of acute kidney allograft rejection are conflicting. This may be explained by the previous use of methodological approaches that disregarded important factors in the analysis of longitudinal measurements and time-to-event data. Therefore, in this study, joint models were used to investigate the relationship between repeated measurements of tacrolimus predose concentrations (C 0) and time to acute biopsy-proven acute rejection (BPAR).Methods: – This was a post hoc analysis of a randomized controlled trial in which living-donor kidney transplant recipients (KTR) received either a standard, bodyweight-based or CYP3A5 genotype-based tacrolimus starting dose. Joint modeling was performed by coupling a mixed-effects model for tacrolimus C 0 with a Cox proportional hazards model for the risk of rejection. Only the first episode of rejection was considered.Results: – A total of 229 KTRs were included, of whom the incidence of BPAR was 10.5% (n = 24 KTRs) in the first 3 months posttransplant. A total of 3069 tacrolimus measurements were available for the analysis. A joint model adjusted for recipient age and peak panel reactive antibodies demonstrated that tacrolimus C 0 was associated with risk of rejection. A 1-unit increase in the time-normalized area under the curve for logarithmically (log)-transformed C 0 represented a change of −2.65 in the log of the relative hazard (95% credible interval: −5.05 to −0.36, P = 0.022).Conclusions: – A negative association between the cumulative effect of tacrolimus C 0 and BPAR was observed using joint modeling. This demonstrated that KTRs with lower tacrolimus exposure were at a higher risk of rejection.
A major challenge in estimating treatment effects in observational studies is the reliance on untestable conditions such as the assumption of no unmeasured confounding. In this work, we propose an algorithm that can falsify the assumption of no unmeasured confounding in a setting with observational data from multiple heterogeneous sources, which we refer to as environments. Our proposed falsification strategy leverages a key observation that unmeasured confounding can cause observed causal mechanisms to appear dependent. Building on this observation, we develop a novel two-stage procedure that detects these dependencies with high statistical power while controlling false positives. The algorithm does not require access to randomized data and, in contrast to other falsification approaches, functions even under transportability violations when the environment has a direct effect on the outcome of interest. To showcase the practical relevance of our approach, we show that our method is able to efficiently detect confounding on both simulated and semi-synthetic data.
Sub-phenotyping in critical care
A valuable strategy or methodologically fragile path?
Passive Monitoring of Parkinson Tremor in Daily Life
A Prototypical Network Approach
Methods: This work compares several combinations of longitudinal and survival models, assessing their predictive performance across different training strategies. Using synthetic and real-world cognitive health data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), we explore the strengths and limitations of each model.
Results: Among the considered survival models, the Random Survival Forest consistently delivered strong results across different datasets, longitudinal models, and training strategies. On the ADNI dataset the best performing method was Random Survival Forest with the last visit benchmark and super landmarking with an average tdAUC of 0.96 and brier score of 0.07. Several other methods, including Cox Proportional Hazards and the Recurrent Neural Network, achieve similar scores. While the tested longitudinal models often struggled to outperform simple benchmarks, neural network models showed some improvement in simulated scenarios with sufficiently informative longitudinal trajectories.
Discussion: Our findings underscore the importance of aligning model selection and training strategies with the specific characteristics of the data and the target application, providing valuable insights that can inform future developments in dynamic survival analysis. ...
Methods: This work compares several combinations of longitudinal and survival models, assessing their predictive performance across different training strategies. Using synthetic and real-world cognitive health data from the Alzheimer's Disease Neuroimaging Initiative (ADNI), we explore the strengths and limitations of each model.
Results: Among the considered survival models, the Random Survival Forest consistently delivered strong results across different datasets, longitudinal models, and training strategies. On the ADNI dataset the best performing method was Random Survival Forest with the last visit benchmark and super landmarking with an average tdAUC of 0.96 and brier score of 0.07. Several other methods, including Cox Proportional Hazards and the Recurrent Neural Network, achieve similar scores. While the tested longitudinal models often struggled to outperform simple benchmarks, neural network models showed some improvement in simulated scenarios with sufficiently informative longitudinal trajectories.
Discussion: Our findings underscore the importance of aligning model selection and training strategies with the specific characteristics of the data and the target application, providing valuable insights that can inform future developments in dynamic survival analysis.
Analyzing PaO2/FiO2?
Mind the interaction with PEEP!
Risk-Based Decision Making
Estimands for Sequential Prediction Under Interventions