G. Jongbloed | TU Delft Repository

Semiparametric Bernstein–Von Mises Phenomenon Via Isotonized Posterior In Wicksell’s Problem

Journal article (2026) - Francesco Gili, Geurt Jongbloed, Aad Van Der Vaart

In this paper, we propose a novel Bayesian approach for nonparametric estimation in Wicksell’s problem. This has important applications in astronomy for estimating the distribution of the positions of the stars in a galaxy given projected stellar positions and in materials science to determine the 3D microstructure of a material, using its 2D cross-sections. We deviate from the classical Bayesian nonparametric approach, which would place a Dirichlet Process (DP) prior on the distribution function of the unobservables, by directly placing a DP prior on the distribution function of the observables. Our method offers computational simplicity due to the conjugacy of the posterior and allows for asymptotically efficient estimation by projecting the posterior onto the L₂ subspace of increasing, right-continuous functions. Indeed, the resulting Isotonized Inverse Posterior (IIP) satisfies a Bernstein–von Mises (BvM) phenomenon with minimax asymptotic variance g₀ (x)/2γ, where γ > 1/2 reflects the degree of Hölder continuity of the true cdf at x. Since the IIP gives automatic uncertainty quantification, it eliminates the need to estimate γ . Our results provide the first semiparametric Bernstein–von Mises theorem for projection-based posteriors with a DP prior in inverse problems. ...

Asymptotically efficient estimation under local constraint in Wicksell's problem

Journal article (2026) - Francesco Gili, Geurt Jongbloed, Aad van der Vaart

We consider nonparametric estimation of the distribution function F of squared sphere radii in the classical Wicksell problem. Under smoothness conditions on F in a neighborhood of x, in Gili et al. (2024) it is shown that the Isotonic Inverse Estimator (IIE) is asymptotically efficient and attains rate of convergence n/logn. If F is constant on an interval containing x, the optimal rate of convergence increases to n and the IIE attains this rate adaptively, i.e. without explicitly using the knowledge of local constancy. However, in this case, the asymptotic distribution is not normal. In this paper, we introduce three informed projection-type estimators of F, which use knowledge on the interval of constancy and show these are all asymptotically equivalent and normal. Furthermore, we establish a local asymptotic minimax lower bound in this setting, proving that the three informed estimators are asymptotically efficient and a convolution result showing that the IIE is not efficient. We also derive the asymptotic distribution of the difference of the IIE with the efficient estimators, demonstrating that the IIE is not asymptotically equivalent to the informed estimators. Through a simulation study, we provide evidence that the performance of the IIE closely resembles that of its competitors, supporting the use of the IIE as the standard choice when no information about F is available. ...

Prediction of survival after pediatric cardiac arrest using heart rate variability and machine learning

Journal article (2026) - Daishi Xu, Eris van Twist, Marit Verboom, Maayke Hunfeld, Corinne Buysse, Geurt Jongbloed, Natasja M.S. de Groot, Robert van den Berg

Background: Early prognostication of the outcome in pediatric cardiac arrest (CA) patients is crucial for clinical decision-making. Heart rate variability (HRV) has shown potential in predicting outcomes after CA in adult patients. This study investigates whether HRV can be used to predict survival outcomes after pediatric CA using machine learning techniques. Methods: This retrospective study included children with CA, who achieved return of spontaneous circulation (ROSC), and were admitted to the pediatric intensive care unit (PICU) of a tertiary hospital between 2012 and 2021. A 5-min electrocardiogram (ECG) segment acquired at 24 h after CA was used to calculate HRV parameters (time-, frequency-, and non-linear domains). These parameters were used to train a random forest model. The primary outcome was 12-month survival or death. Model performance was evaluated using receiver-operating characteristics (ROC) analysis and predictive values. Feature importance was assessed using Shapley values. Results: A total of 76 patients (male: 63.2%, median age: 2.5 [IQR: 0.4–8.0] years) were divided into survival (34) or death (42) groups based on 12-month outcomes. The machine learning model achieved an accuracy of 77.6% and a positive predictive value of 0.879 for mortality prediction. The most influential features for model predictions were the frequency-domain parameters total power and very-low frequency (VLF) power, with lower values associated with an increased probability of death. Conclusions: Analysis of HRV at 24 h after ROSC may serve as a strong predictor of 12-month survival after pediatric CA. ...

Estimation of 3D grain size distributions from 2D sections in real and simulated microstructures

Journal article (2025) - Thomas van der Jagt, Martina Vittorietti, Karo Sedighiani, Cornelis Bos, Geurt Jongbloed

Obtaining information about the 3D grain size distribution of metallic microstructures is crucial for understanding the mechanical behavior of metals. This paper addresses the problem of estimating the 3D grain size distribution from 2D cross sections. This is a well-known stereological problem and different estimators have been proposed in the literature. We propose a statistical estimation procedure that provides consistent estimates without relying on arbitrary binning choices. When applying this procedure to space filling structures, we investigate the impact of the choice of grain shape and propose a heuristic to choose the best grain shape. To validate our approach, we employ simulations using Laguerre–Voronoi diagrams and apply our methodology to a sample of Interstitial-Free steel, obtained via EBSD. ...

Novel miniaturised microbial electrosynthesis reactor

A study on replicability

Journal article (2025) - Marika A.J. Zegers, Eva Augustijn, Geurt Jongbloed, Ludovic Jourdin

Carbon capture and utilisation are crucial for reducing fossil fuel dependence and transforming the chemical and energy industries. Microbial electrosynthesis (MES) is a promising technology where electrotrophic microorganisms convert CO₂ into valuable biochemicals using electricity. Despite recent advancements, replicability in MES remains poorly understood, with scarce pre-inoculation abiotic data and limited exploration of abiotic and biotic performance correlations. This study introduces a novel miniaturised reactor, modelled after a state-of-the-art flat-plate directed-flow-through bioelectrochemical reactor (DFBR). Four miniaturised reactors were tested in parallel under abiotic conditions to evaluate the impact of electrode material, reactor design, and assembly on replicability of electrochemical behaviour. Using the dynamic time warping (DTW) algorithm, reactor similarity was quantified for the first time based on electrochemical performance. Kernel scatterplot smoothing on micro-CT data revealed that electrodes, particularly the commonly used carbon felt, are a significant source of variability in electrochemical performance, as further supported by additional abiotic electrochemical tests. Additionally, the miniaturised reactors were inoculated with an enriched mixed culture to examine microbial activity's effect on replicability, achieving concentrations up to 4.55 g L^-1 acetate, 0.96 g L^-1 butyrate, and 0.38 g L^-1 caproate after 60 days. Variations in abiotic conditions, including maximum reachable current density, onset potential, and porosity, influence biofilm growth and performance. The miniaturised DFBR effectively represents the serpentine DFBR, while the adaptable reactor design and proposed statistical methods set a new benchmark for MES research. ...

Robust Transfer Learning for Battery Lifetime Prediction Using Early Cycle Data

Journal article (2025) - Wenda Kang, Dianpeng Wang, Geurt Jongbloed, Jiawen Hu, Piao Chen

Battery lifetime prediction is crucial in industrial applications. However, the lack of diversity in training data often poses challenges regarding the robustness and generalization of lifetime predictions for batteries from different batches. Motivated by the early cycle data from lithium-ion batteries, this article proposes a robust transfer learning method by employing a model average framework, where the weights are determined based on the distance between the source domain and the target domain. Kernel regression is used to build the prediction of battery lifetime using early cycle data, and transfer component analysis is utilized to transfer knowledge between different domains. The case study on lithium-ion phosphate/graphite cells demonstrates that the proposed method can mitigate the impact of negative transfer and has superior performance compared to traditional methods. ...

Nonparametric inference for Poisson-Laguerre tessellations

Journal article (2025) - Thomas van der Jagt, Geurt Jongbloed, Martina Vittorietti

In this paper, we consider statistical inference for Poisson-Laguerre tessellations in (Formula presented.). The object of interest is a distribution function (Formula presented.) which describes the distribution of the arrival times of the generator points. The function (Formula presented.) uniquely determines the intensity measure of the underlying Poisson process. Two nonparametric estimators for (Formula presented.) are introduced, which depend only on the points of the Poisson process that generate non-empty cells and the actual cells corresponding to these points. The proposed estimators are proven to be strongly consistent as the observation window expands unboundedly to the whole space. We also consider a stereological setting, where one is interested in estimating the distribution function associated with the Poisson process of a higher-dimensional Poisson-Laguerre tessellation, given that a corresponding sectional Poisson-Laguerre tessellation is observed. ...

The trade-off between model flexibility and accuracy of the Expected Threat model in football

Book chapter (2025) - K.W. van Arem, Jakob Söhl, Mirjam Bruinsma, Geurt Jongbloed

With an average football (soccer) match recording over 3,000 on-ball events, effective use of this event data is essential for practitioners at football clubs to obtain meaningful insights. Models can extract more information from this data, and explainable methods can make them more accessible to practitioners. The Expected Threat model has been praised for its explainability and offers an accessible option. However, selecting the grid size is a challenging key design choice that has to be made when applying the Expected Threat model. Using a finer grid leads to a more flexible model that can better distinguish between different situations, but the accuracy of the estimates deteriorates with a more flexible model. Consequently, practitioners face challenges in balancing the trade-off between model flexibility and model accuracy.
In this study, the Expected Threat model \added{is analyzed} from a theoretical perspective and simulations are performed based on the Markov chain of the model to examine its behavior in practice. Our theoretical results establish an upper bound on the error of the Expected Threat model for different flexibilities. Based on the simulations, a more accurate characterization of the model’s error is provided, improving over the theoretical bound. Finally, these insights are converted into a practical rule of thumb to help practitioners choose the right balance between the model flexibility and the desired accuracy of the Expected Threat model. ...

Degradation index-based prediction for remaining useful life using multivariate sensor data

Journal article (2024) - Wenda Kang, Geurt Jongbloed, Yubin Tian, Piao Chen

The prediction of remaining useful life (RUL) is a critical component of prognostic and health management for industrial systems. In recent decades, there has been a surge of interest in RUL prediction based on degradation data of a well-defined degradation index (DI). However, in many real-world applications, the DI may not be readily available and must be constructed from complex source data, rendering many existing methods inapplicable. Motivated by multivariate sensor data from industrial induction motors, this paper proposes a novel prognostic framework that develops a nonlinear DI, serving as an ensemble of representative features, and employs a similarity-based method for RUL prediction. The proposed framework enables online prediction of RUL by dynamically updating information from the in-service unit. Simulation studies and a case study on three-phase industrial induction motors demonstrate that the proposed framework can effectively extract reliability information from various channels and predict RUL with high accuracy. ...

Testing for no effect in regression problems

A permutation approach

Journal article (2024) - Michał G. Ciszewski, Jakob Söhl, Ton Leenen, Bart van Trigt, Geurt Jongbloed

Often the question arises whether (Formula presented.) can be predicted based on (Formula presented.) using a certain model. Especially for highly flexible models such as neural networks one may ask whether a seemingly good prediction is actually better than fitting pure noise or whether it has to be attributed to the flexibility of the model. This paper proposes a rigorous permutation test to assess whether the prediction is better than the prediction of pure noise. The test avoids any sample splitting and is based instead on generating new pairings of (Formula presented.). It introduces a new formulation of the null hypothesis and rigorous justification for the test, which distinguishes it from the previous literature. The theoretical findings are applied both to simulated data and to sensor data of tennis serves in an experimental context. The simulation study underscores how the available information affects the test. It shows that the less informative the predictors, the lower the probability of rejecting the null hypothesis of fitting pure noise and emphasizes that detecting weaker dependence between variables requires a sufficient sample size. ...

Prediction of Survival After Pediatric Cardiac Arrest Using Quantitative EEG and Machine Learning Techniques

Journal article (2024) - Maayke Hunfeld, Marit Verboom, Sabine Josemans, Annemiek van Ravensberg, Dirk Straver, Femke Lückerath, Geurt Jongbloed, Corinne Buysse, Robert van den Berg

Background and Objectives Early neuroprognostication in children with reduced consciousness after cardiac arrest (CA) is a major clinical challenge. EEG is frequently used for neuroprognostication in adults, but has not been sufficiently validated for this indication in children. Using machine learning techniques, we studied the predictive value of quantitative EEG (qEEG) features for survival 12 months after CA, based on EEG recordings obtained 24 hours after CA in children. The results were confirmed through visual analysis of EEG background patterns. Methods This is a retrospective single-center study including children (0–17 years) with CA, who were subsequently admitted to the pediatric intensive care unit (PICU) of a tertiary care hospital between 2012 and 2021 after return of circulation (ROC) and were monitored using EEG at 24 hours after ROC. Signal features were extracted from a 30-minute EEG segment 24 hours after CA and used to train a random forest model. The background pattern from the same EEG fragment was visually classified. The primary outcome was survival or death 12 months after CA. Analysis of the prognostic accuracy of the model included calculation of receiver-operating characteristic and predictive values. Feature contribution to the model was analyzed using Shapley values. Results Eighty-six children were included (in-hospital CA 27%, out-of-hospital CA 73%). The median age at CA was 2.6 years; 53 (62%) were male. Mortality at 12 months was 56%; main causes of death on the PICU were withdrawal of life-sustaining therapies because of poor neurologic prognosis (52%) and brain death (31%). The random forest model was able to predict death at 12 months with an accuracy of 0.77 and positive predictive value of 1.0. Continuity and amplitude of the EEG signal were the signal parameters most contributing to the model classification. Visual analysis showed that no patients with a background pattern other than continuous with amplitudes exceeding 20 μV were alive after 12 months. Discussion Both qEEG and visual EEG background classification for registrations obtained 24 hours after ROC form a strong predictor of nonsurvival 12 months after CA in children. ...

Background and Objectives Early neuroprognostication in children with reduced consciousness after cardiac arrest (CA) is a major clinical challenge. EEG is frequently used for neuroprognostication in adults, but has not been sufficiently validated for this indication in children. Using machine learning techniques, we studied the predictive value of quantitative EEG (qEEG) features for survival 12 months after CA, based on EEG recordings obtained 24 hours after CA in children. The results were confirmed through visual analysis of EEG background patterns. Methods This is a retrospective single-center study including children (0–17 years) with CA, who were subsequently admitted to the pediatric intensive care unit (PICU) of a tertiary care hospital between 2012 and 2021 after return of circulation (ROC) and were monitored using EEG at 24 hours after ROC. Signal features were extracted from a 30-minute EEG segment 24 hours after CA and used to train a random forest model. The background pattern from the same EEG fragment was visually classified. The primary outcome was survival or death 12 months after CA. Analysis of the prognostic accuracy of the model included calculation of receiver-operating characteristic and predictive values. Feature contribution to the model was analyzed using Shapley values. Results Eighty-six children were included (in-hospital CA 27%, out-of-hospital CA 73%). The median age at CA was 2.6 years; 53 (62%) were male. Mortality at 12 months was 56%; main causes of death on the PICU were withdrawal of life-sustaining therapies because of poor neurologic prognosis (52%) and brain death (31%). The random forest model was able to predict death at 12 months with an accuracy of 0.77 and positive predictive value of 1.0. Continuity and amplitude of the EEG signal were the signal parameters most contributing to the model classification. Visual analysis showed that no patients with a background pattern other than continuous with amplitudes exceeding 20 μV were alive after 12 months. Discussion Both qEEG and visual EEG background classification for registrations obtained 24 hours after ROC form a strong predictor of nonsurvival 12 months after CA in children.

Confidence intervals in monotone regression

Journal article (2024) - Piet Groeneboom, Geurt Jongbloed

We construct bootstrap confidence intervals for a monotone regression function. It has been shown that the ordinary nonparametric bootstrap, based on the nonparametric least squares estimator (LSE) (Formula presented.), is inconsistent in this situation. We show that an (Formula presented.) -consistent bootstrap can be based on the smoothed (Formula presented.), to be called the SLSE (Smoothed Least Squares Estimator). The asymptotic pointwise distribution of the SLSE is derived. The confidence intervals, based on the smoothed bootstrap, are compared to intervals based on the (not necessarily monotone) Nadaraya Watson estimator and the effect of Studentization is investigated. We also give a method for automatic bandwidth choice, correcting work in Sen and Xu (2015). Analogous methods for constructing confidence intervals in the current status model are discussed, improving on work in Groeneboom and Hendrickx (2018). ...

Stereological determination of particle size distributions for similar convex bodies

Journal article (2024) - Thomas van der Jagt, Geurt Jongbloed, Martina Vittorietti

Consider an opaque medium that contains 3D particles. All particles are convex bodies of the same shape, but they vary in size. The particles are randomly positioned and oriented within the medium and cannot be observed directly. Taking a planar section of the medium we obtain a sample of observed 2D section profile areas of the intersected particles. In this paper, the distribution of interest is the underlying 3D particle size distribution for which an identifiability result is obtained. Moreover, a non-parametric estimator is proposed for this size distribution. The estimator is proven to be consistent and its performance is assessed in a simulation study. ...

Improving state estimation through projection post-processing for activity recognition with application to football

Journal article (2023) - Michał Ciszewski, Jakob Söhl, Geurt Jongbloed

The past decade has seen an increased interest in human activity recognition based on sensor data. Most often, the sensor data come unannotated, creating the need for fast labelling methods. For assessing the quality of the labelling, an appropriate performance measure has to be chosen. Our main contribution is a novel post-processing method for activity recognition. It improves the accuracy of the classification methods by correcting for unrealistic short activities in the estimate. We also propose a new performance measure, the Locally Time-Shifted Measure (LTS measure), which addresses uncertainty in the times of state changes. The effectiveness of the post-processing method is evaluated, using the novel LTS measure, on the basis of a simulated dataset and a real application on sensor data from football. The simulation study is also used to discuss the choice of the parameters of the post-processing method and the LTS measure. ...

Existence and approximation of densities of chord length- and cross section area distributions

Journal article (2023) - Thomas van der Jagt, Geurt Jongbloed, Martina Vittorietti

In various stereological problems ann-dimensional convex body is intersected with an(n−1)-dimensionalIsotropic Uniformly Random (IUR) hyperplane. In this paper the cumulative distribution function associatedwith the(n−1)-dimensional volume of such a random section is studied. This distribution is also knownas chord length distribution and cross section area distribution in the planar and spatial case respectively.For various classes of convex bodies it is shown that these distribution functions are absolutely continuouswith respect to Lebesgue measure. A Monte Carlo simulation scheme is proposed for approximating thecorresponding probability density functions. ...

Statistical integration of heterogeneous omics data

Probabilistic two-way partial least squares (PO2PLS)

Journal article (2022) - Said el Bouhaddani, Hae Won Uh, Geurt Jongbloed, Jeanine Houwing-Duistermaat

The availability of multi-omics data has revolutionized the life sciences by creating avenues for integrated system-level approaches. Data integration links the information across datasets to better understand the underlying biological processes. However, high dimensionality, correlations and heterogeneity pose statistical and computational challenges. We propose a general framework, probabilistic two-way partial least squares (PO2PLS), that addresses these challenges. PO2PLS models the relationship between two datasets using joint and data-specific latent variables. For maximum likelihood estimation of the parameters, we propose a novel fast EM algorithm and show that the estimator is asymptotically normally distributed. A global test for the relationship between two datasets is proposed, specifically addressing the high dimensionality, and its asymptotic distribution is derived. Notably, several existing data integration methods are special cases of PO2PLS. Via extensive simulations, we show that PO2PLS performs better than alternatives in feature selection and prediction performance. In addition, the asymptotic distribution appears to hold when the sample size is sufficiently large. We illustrate PO2PLS with two examples from commonly used study designs: a large population cohort and a small case–control study. Besides recovering known relationships, PO2PLS also identified novel findings. The methods are implemented in our R-package PO2PLS. ...

Coastal environmental and atmospheric data reduction in the Southern North Sea supporting ecological impact studies

Journal article (2022) - L. Mészáros, F.H. van der Meulen, G. Jongbloed, G.Y.H. El Serafy

Coastal climate impact studies make increasing use of multi-source and multi-dimensional atmospheric and environmental datasets to investigate relationships between climate signals and the ecological response. The large quantity of numerically simulated data may, however, include redundancy, multi-colinearity and excess information not relevant to the studied processes. In such cases techniques for feature extraction and identification of latent processes prove useful. Using dimensionality reduction techniques this research provides a statistical underpinning of variable selection to study the impacts of atmospheric processes on coastal chlorophyll-a concentrations, taking the Dutch Wadden Sea as case study. Dimension reduction techniques are applied to environmental data simulated by the Delft3D coastal water quality model, the HIRLAM numerical weather prediction model and the Euro-CORDEX climate modelling experiment. The dimension reduction techniques were selected for their ability to incorporate (1) spatial correlation via multi-way methods (2), temporal correlation through Dynamic Factor Analysis, and (3) functional variability using Functional Data Analysis. The data reduction potential and explanatory value of these methods are showcased and important atmospheric variables affecting the chlorophyll-a concentration are identified. Our results indicate room for dimensionality reduction in the atmospheric variables (2 principle components can explain the majority of variance instead of 7 variables), in the chlorophyll-a time series at different locations (two characteristic patterns can describe the 10 locations), and in the climate projection scenarios of solar radiation and air temperature variables (a single principle component function explains 77% of the variation for solar radiation and 57% of the variation for air temperature). It was also found that solar radiation followed by air temperature are the most important atmospheric variables related to coastal chlorophyll-a concentration, noting that regional differences exist, for instance the importance of air temperature is greater in the Eastern Dutch Wadden Sea at Dantziggat than in the Western Dutch Wadden Sea at Marsdiep Noord. Common trends and different regional system characteristics have also been identified through dynamic factor analysis between the deeper channels and the shallower intertidal zones, where the onset of spring blooms occurs earlier. The functional analysis of climate data showed clusters of atmospheric variables with similar functional features. Moreover, functional components of Euro-CORDEX climate scenarios have been identified for radiation and temperature variables, which provide information on the dominant mode (pattern) of variation and its uncertainties. The findings suggest that radiation and temperature projections of different Euro-CORDEX scenarios share similar characteristics and mainly differ in their amplitudes and seasonal patterns, offering opportunities to construct statistical models that do not assume independence between climate scenarios but instead borrow information (“borrow strength”) from the larger pool of climate scenarios. The presented results were used in follow up studies to construct a Bayesian stochastic generator to complement existing Euro-CORDEX climate change scenarios and to quantify climate change induced trends and uncertainties in phytoplankton spring bloom dynamics in the Dutch Wadden Sea. ...

Coastal climate impact studies make increasing use of multi-source and multi-dimensional atmospheric and environmental datasets to investigate relationships between climate signals and the ecological response. The large quantity of numerically simulated data may, however, include redundancy, multi-colinearity and excess information not relevant to the studied processes. In such cases techniques for feature extraction and identification of latent processes prove useful. Using dimensionality reduction techniques this research provides a statistical underpinning of variable selection to study the impacts of atmospheric processes on coastal chlorophyll-a concentrations, taking the Dutch Wadden Sea as case study. Dimension reduction techniques are applied to environmental data simulated by the Delft3D coastal water quality model, the HIRLAM numerical weather prediction model and the Euro-CORDEX climate modelling experiment. The dimension reduction techniques were selected for their ability to incorporate (1) spatial correlation via multi-way methods (2), temporal correlation through Dynamic Factor Analysis, and (3) functional variability using Functional Data Analysis. The data reduction potential and explanatory value of these methods are showcased and important atmospheric variables affecting the chlorophyll-a concentration are identified. Our results indicate room for dimensionality reduction in the atmospheric variables (2 principle components can explain the majority of variance instead of 7 variables), in the chlorophyll-a time series at different locations (two characteristic patterns can describe the 10 locations), and in the climate projection scenarios of solar radiation and air temperature variables (a single principle component function explains 77% of the variation for solar radiation and 57% of the variation for air temperature). It was also found that solar radiation followed by air temperature are the most important atmospheric variables related to coastal chlorophyll-a concentration, noting that regional differences exist, for instance the importance of air temperature is greater in the Eastern Dutch Wadden Sea at Dantziggat than in the Western Dutch Wadden Sea at Marsdiep Noord. Common trends and different regional system characteristics have also been identified through dynamic factor analysis between the deeper channels and the shallower intertidal zones, where the onset of spring blooms occurs earlier. The functional analysis of climate data showed clusters of atmospheric variables with similar functional features. Moreover, functional components of Euro-CORDEX climate scenarios have been identified for radiation and temperature variables, which provide information on the dominant mode (pattern) of variation and its uncertainties. The findings suggest that radiation and temperature projections of different Euro-CORDEX scenarios share similar characteristics and mainly differ in their amplitudes and seasonal patterns, offering opportunities to construct statistical models that do not assume independence between climate scenarios but instead borrow information (“borrow strength”) from the larger pool of climate scenarios. The presented results were used in follow up studies to construct a Bayesian stochastic generator to complement existing Euro-CORDEX climate change scenarios and to quantify climate change induced trends and uncertainties in phytoplankton spring bloom dynamics in the Dutch Wadden Sea.

Forward variable selection for random forest models

Journal article (2022) - Jasper Velthoen, Juan Juan Cai, Geurt Jongbloed

Random forest is a popular prediction approach for handling high dimensional covariates. However, it often becomes infeasible to interpret the obtained high dimensional and non-parametric model. Aiming for an interpretable predictive model, we develop a forward variable selection method using the continuous ranked probability score (CRPS) as the loss function. eOur stepwise procedure selects at each step a variable that minimizes the CRPS risk and a stopping criterion for selection is designed based on an estimation of the CRPS risk difference of two consecutive steps. We provide mathematical motivation for our method by proving that in a population sense, the method attains the optimal set. In a simulation study, we compare the performance of our method with an existing variable selection method, for different sample sizes and correlation strength of covariates. Our method is observed to have a much lower false positive rate. We also demonstrate an application of our method to statistical post-processing of daily maximum temperature forecasts in the Netherlands. Our method selects about 10% covariates while retaining the same predictive power. ...

A Data-Driven Approach for Studying the Influence of Carbides on Work Hardening of Steel

Journal article (2022) - M. Vittorietti, J. Hidalgo Garcia, J. Galan Lopez, J. Sietsma, G. Jongbloed

This study proposes a new approach to determine phenomenological or physical relations between microstructure features and the mechanical behavior of metals bridging advanced statistics and materials science in a study of the effect of hard precipitates on the hardening of metal alloys. Synthetic microstructures were created using multi-level Voronoi diagrams in order to control microstructure variability and then were used as samples for virtual tensile tests in a full-field crystal plasticity solver. A data-driven model based on Functional Principal Component Analysis (FPCA) was confronted with the classical Voce law for the description of uniaxial tensile curves of synthetic AISI 420 steel microstructures consisting of a ferritic matrix and increasing volume fractions of M23C6 carbides. The parameters of the two models were interpreted in terms of carbide volume fractions and texture using linear mixed-effects models. ...

Climate change induced trends and uncertainties in phytoplankton spring bloom dynamics

Journal article (2021) - Lőrinc Mészáros, Frank van der Meulen, Geurt Jongbloed, Ghada El Serafy

Spring phytoplankton blooms in the southern North Sea substantially contribute to annual primary production and largely influence food web dynamics. Studying long-term changes in spring bloom dynamics is therefore crucial for understanding future climate responses and predicting implications on the marine ecosystem. This paper aims to study long term changes in spring bloom dynamics in the Dutch coastal waters, using historical coastal in-situ data and satellite observations as well as projected future solar radiation and air temperature trajectories from regional climate models as driving forces covering the twenty-first century. The main objective is to derive long-term trends and quantify climate induced uncertainties in future coastal phytoplankton phenology. The three main methodological steps to achieve this goal include (1) developing a data fusion model to interlace coastal in-situ measurements and satellite chlorophyll-a observations into a single multi-decadal signal; (2) applying a Bayesian structural time series model to produce long-term projections of chlorophyll-a concentrations over the twenty-first century; and (3) developing a feature extraction method to derive the cardinal dates (beginning, peak, end) of the spring bloom to track the historical and the projected changes in its dynamics. The data fusion model produced an enhanced chlorophyll-a time series with improved accuracy by correcting the satellite observed signal with in-situ observations. The applied structural time series model proved to have sufficient goodness-of-fit to produce long term chlorophyll-a projections, and the feature extraction method was found to be robust in detecting cardinal dates when spring blooms were present. The main research findings indicate that at the study site location the spring bloom characteristics are impacted by the changing climatic conditions. Our results suggest that toward the end of the twenty-first century spring blooms will steadily shift earlier, resulting in longer spring bloom duration. Spring bloom magnitudes are also projected to increase with a 0.4% year⁻¹ trend. Based on the ensemble simulation the largest uncertainty lies in the timing of the spring bloom beginning and-end timing, while the peak timing has less variation. Further studies would be required to link the findings of this paper and ecosystem behavior to better understand possible consequences to the ecosystem. ...

Spring phytoplankton blooms in the southern North Sea substantially contribute to annual primary production and largely influence food web dynamics. Studying long-term changes in spring bloom dynamics is therefore crucial for understanding future climate responses and predicting implications on the marine ecosystem. This paper aims to study long term changes in spring bloom dynamics in the Dutch coastal waters, using historical coastal in-situ data and satellite observations as well as projected future solar radiation and air temperature trajectories from regional climate models as driving forces covering the twenty-first century. The main objective is to derive long-term trends and quantify climate induced uncertainties in future coastal phytoplankton phenology. The three main methodological steps to achieve this goal include (1) developing a data fusion model to interlace coastal in-situ measurements and satellite chlorophyll-a observations into a single multi-decadal signal; (2) applying a Bayesian structural time series model to produce long-term projections of chlorophyll-a concentrations over the twenty-first century; and (3) developing a feature extraction method to derive the cardinal dates (beginning, peak, end) of the spring bloom to track the historical and the projected changes in its dynamics. The data fusion model produced an enhanced chlorophyll-a time series with improved accuracy by correcting the satellite observed signal with in-situ observations. The applied structural time series model proved to have sufficient goodness-of-fit to produce long term chlorophyll-a projections, and the feature extraction method was found to be robust in detecting cardinal dates when spring blooms were present. The main research findings indicate that at the study site location the spring bloom characteristics are impacted by the changing climatic conditions. Our results suggest that toward the end of the twenty-first century spring blooms will steadily shift earlier, resulting in longer spring bloom duration. Spring bloom magnitudes are also projected to increase with a 0.4% year⁻¹ trend. Based on the ensemble simulation the largest uncertainty lies in the timing of the spring bloom beginning and-end timing, while the peak timing has less variation. Further studies would be required to link the findings of this paper and ecosystem behavior to better understand possible consequences to the ecosystem.