G. Jongbloed
Please Note
47 records found
1
Background: Early prognostication of the outcome in pediatric cardiac arrest (CA) patients is crucial for clinical decision-making. Heart rate variability (HRV) has shown potential in predicting outcomes after CA in adult patients. This study investigates whether HRV can be used to predict survival outcomes after pediatric CA using machine learning techniques. Methods: This retrospective study included children with CA, who achieved return of spontaneous circulation (ROSC), and were admitted to the pediatric intensive care unit (PICU) of a tertiary hospital between 2012 and 2021. A 5-min electrocardiogram (ECG) segment acquired at 24 h after CA was used to calculate HRV parameters (time-, frequency-, and non-linear domains). These parameters were used to train a random forest model. The primary outcome was 12-month survival or death. Model performance was evaluated using receiver-operating characteristics (ROC) analysis and predictive values. Feature importance was assessed using Shapley values. Results: A total of 76 patients (male: 63.2%, median age: 2.5 [IQR: 0.4–8.0] years) were divided into survival (34) or death (42) groups based on 12-month outcomes. The machine learning model achieved an accuracy of 77.6% and a positive predictive value of 0.879 for mortality prediction. The most influential features for model predictions were the frequency-domain parameters total power and very-low frequency (VLF) power, with lower values associated with an increased probability of death. Conclusions: Analysis of HRV at 24 h after ROSC may serve as a strong predictor of 12-month survival after pediatric CA.
In this paper, we propose a novel Bayesian approach for nonparametric estimation in Wicksell’s problem. This has important applications in astronomy for estimating the distribution of the positions of the stars in a galaxy given projected stellar positions and in materials science to determine the 3D microstructure of a material, using its 2D cross-sections. We deviate from the classical Bayesian nonparametric approach, which would place a Dirichlet Process (DP) prior on the distribution function of the unobservables, by directly placing a DP prior on the distribution function of the observables. Our method offers computational simplicity due to the conjugacy of the posterior and allows for asymptotically efficient estimation by projecting the posterior onto the L2 subspace of increasing, right-continuous functions. Indeed, the resulting Isotonized Inverse Posterior (IIP) satisfies a Bernstein–von Mises (BvM) phenomenon with minimax asymptotic variance g0 (x)/2γ, where γ > 1/2 reflects the degree of Hölder continuity of the true cdf at x. Since the IIP gives automatic uncertainty quantification, it eliminates the need to estimate γ . Our results provide the first semiparametric Bernstein–von Mises theorem for projection-based posteriors with a DP prior in inverse problems.
In this paper, we consider statistical inference for Poisson-Laguerre tessellations in (Formula presented.). The object of interest is a distribution function (Formula presented.) which describes the distribution of the arrival times of the generator points. The function (Formula presented.) uniquely determines the intensity measure of the underlying Poisson process. Two nonparametric estimators for (Formula presented.) are introduced, which depend only on the points of the Poisson process that generate non-empty cells and the actual cells corresponding to these points. The proposed estimators are proven to be strongly consistent as the observation window expands unboundedly to the whole space. We also consider a stereological setting, where one is interested in estimating the distribution function associated with the Poisson process of a higher-dimensional Poisson-Laguerre tessellation, given that a corresponding sectional Poisson-Laguerre tessellation is observed.
Battery lifetime prediction is crucial in industrial applications. However, the lack of diversity in training data often poses challenges regarding the robustness and generalization of lifetime predictions for batteries from different batches. Motivated by the early cycle data from lithium-ion batteries, this article proposes a robust transfer learning method by employing a model average framework, where the weights are determined based on the distance between the source domain and the target domain. Kernel regression is used to build the prediction of battery lifetime using early cycle data, and transfer component analysis is utilized to transfer knowledge between different domains. The case study on lithium-ion phosphate/graphite cells demonstrates that the proposed method can mitigate the impact of negative transfer and has superior performance compared to traditional methods.
Novel miniaturised microbial electrosynthesis reactor
A study on replicability
Carbon capture and utilisation are crucial for reducing fossil fuel dependence and transforming the chemical and energy industries. Microbial electrosynthesis (MES) is a promising technology where electrotrophic microorganisms convert CO2 into valuable biochemicals using electricity. Despite recent advancements, replicability in MES remains poorly understood, with scarce pre-inoculation abiotic data and limited exploration of abiotic and biotic performance correlations. This study introduces a novel miniaturised reactor, modelled after a state-of-the-art flat-plate directed-flow-through bioelectrochemical reactor (DFBR). Four miniaturised reactors were tested in parallel under abiotic conditions to evaluate the impact of electrode material, reactor design, and assembly on replicability of electrochemical behaviour. Using the dynamic time warping (DTW) algorithm, reactor similarity was quantified for the first time based on electrochemical performance. Kernel scatterplot smoothing on micro-CT data revealed that electrodes, particularly the commonly used carbon felt, are a significant source of variability in electrochemical performance, as further supported by additional abiotic electrochemical tests. Additionally, the miniaturised reactors were inoculated with an enriched mixed culture to examine microbial activity's effect on replicability, achieving concentrations up to 4.55 g L-1 acetate, 0.96 g L-1 butyrate, and 0.38 g L-1 caproate after 60 days. Variations in abiotic conditions, including maximum reachable current density, onset potential, and porosity, influence biofilm growth and performance. The miniaturised DFBR effectively represents the serpentine DFBR, while the adaptable reactor design and proposed statistical methods set a new benchmark for MES research.
In this study, the Expected Threat model \added{is analyzed} from a theoretical perspective and simulations are performed based on the Markov chain of the model to examine its behavior in practice. Our theoretical results establish an upper bound on the error of the Expected Threat model for different flexibilities. Based on the simulations, a more accurate characterization of the model’s error is provided, improving over the theoretical bound. Finally, these insights are converted into a practical rule of thumb to help practitioners choose the right balance between the model flexibility and the desired accuracy of the Expected Threat model. ...
In this study, the Expected Threat model \added{is analyzed} from a theoretical perspective and simulations are performed based on the Markov chain of the model to examine its behavior in practice. Our theoretical results establish an upper bound on the error of the Expected Threat model for different flexibilities. Based on the simulations, a more accurate characterization of the model’s error is provided, improving over the theoretical bound. Finally, these insights are converted into a practical rule of thumb to help practitioners choose the right balance between the model flexibility and the desired accuracy of the Expected Threat model.
Background and Objectives Early neuroprognostication in children with reduced consciousness after cardiac arrest (CA) is a major clinical challenge. EEG is frequently used for neuroprognostication in adults, but has not been sufficiently validated for this indication in children. Using machine learning techniques, we studied the predictive value of quantitative EEG (qEEG) features for survival 12 months after CA, based on EEG recordings obtained 24 hours after CA in children. The results were confirmed through visual analysis of EEG background patterns. Methods This is a retrospective single-center study including children (0–17 years) with CA, who were subsequently admitted to the pediatric intensive care unit (PICU) of a tertiary care hospital between 2012 and 2021 after return of circulation (ROC) and were monitored using EEG at 24 hours after ROC. Signal features were extracted from a 30-minute EEG segment 24 hours after CA and used to train a random forest model. The background pattern from the same EEG fragment was visually classified. The primary outcome was survival or death 12 months after CA. Analysis of the prognostic accuracy of the model included calculation of receiver-operating characteristic and predictive values. Feature contribution to the model was analyzed using Shapley values. Results Eighty-six children were included (in-hospital CA 27%, out-of-hospital CA 73%). The median age at CA was 2.6 years; 53 (62%) were male. Mortality at 12 months was 56%; main causes of death on the PICU were withdrawal of life-sustaining therapies because of poor neurologic prognosis (52%) and brain death (31%). The random forest model was able to predict death at 12 months with an accuracy of 0.77 and positive predictive value of 1.0. Continuity and amplitude of the EEG signal were the signal parameters most contributing to the model classification. Visual analysis showed that no patients with a background pattern other than continuous with amplitudes exceeding 20 μV were alive after 12 months. Discussion Both qEEG and visual EEG background classification for registrations obtained 24 hours after ROC form a strong predictor of nonsurvival 12 months after CA in children.
Consider an opaque medium that contains 3D particles. All particles are convex bodies of the same shape, but they vary in size. The particles are randomly positioned and oriented within the medium and cannot be observed directly. Taking a planar section of the medium we obtain a sample of observed 2D section profile areas of the intersected particles. In this paper, the distribution of interest is the underlying 3D particle size distribution for which an identifiability result is obtained. Moreover, a non-parametric estimator is proposed for this size distribution. The estimator is proven to be consistent and its performance is assessed in a simulation study.
Testing for no effect in regression problems
A permutation approach
Often the question arises whether (Formula presented.) can be predicted based on (Formula presented.) using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of (Formula presented.). It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from the previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size.
We construct bootstrap confidence intervals for a monotone regression function. It has been shown that the ordinary nonparametric bootstrap, based on the nonparametric least squares estimator (LSE) (Formula presented.), is inconsistent in this situation. We show that an (Formula presented.) -consistent bootstrap can be based on the smoothed (Formula presented.), to be called the SLSE (Smoothed Least Squares Estimator). The asymptotic pointwise distribution of the SLSE is derived. The confidence intervals, based on the smoothed bootstrap, are compared to intervals based on the (not necessarily monotone) Nadaraya Watson estimator and the effect of Studentization is investigated. We also give a method for automatic bandwidth choice, correcting work in Sen and Xu (2015). Analogous methods for constructing confidence intervals in the current status model are discussed, improving on work in Groeneboom and Hendrickx (2018).
The past decade has seen an increased interest in human activity recognition based on sensor data. Most often, the sensor data come unannotated, creating the need for fast labelling methods. For assessing the quality of the labelling, an appropriate performance measure has to be chosen. Our main contribution is a novel post-processing method for activity recognition. It improves the accuracy of the classification methods by correcting for unrealistic short activities in the estimate. We also propose a new performance measure, the Locally Time-Shifted Measure (LTS measure), which addresses uncertainty in the times of state changes. The effectiveness of the post-processing method is evaluated, using the novel LTS measure, on the basis of a simulated dataset and a real application on sensor data from football. The simulation study is also used to discuss the choice of the parameters of the post-processing method and the LTS measure.
Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. eOur stepwise procedure selects at each step a variable that minimizes the CRPS risk and a stopping criterion for selection is designed based on an estimation of the CRPS risk difference of two consecutive steps. We provide mathematical motivation for our method by proving that in a population sense, the method attains the optimal set. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power.
Statistical integration of heterogeneous omics data
Probabilistic two-way partial least squares (PO2PLS)
The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS.
Suppose X1, …, Xn is a random sample from a bounded and decreasing density f0 on [0, ∞). We are interested in estimating such f0, with special interest in f0 (0). This problem is encountered in various statistical applications and has gained quite some attention in the statistical literature. It is well known that the maximum likelihood estimator is inconsistent at zero. This has led several authors to propose alternative estimators which are consistent. As any decreasing density can be represented as a scale mixture of uniform densities, a Bayesian estimator is obtained by endowing the mixture distribution with the Dirichlet process prior. Assuming this prior, we derive contraction rates of the posterior density at zero by carefully revising arguments presented in Salomond (Electronic Journal of Statistics 8 (2014) 1380– 1404). Several choices of base measure are numerically evaluated and compared. In a simulation various frequentist methods and a Bayesian estimator are compared. Finally, the Bayesian procedure is applied to current durations data described in Slama et al. (Human Reproduction 27 (2012) 1489–1498).