D. Kurowicka | TU Delft Repository

Sensitivity Analysis in Stochastic Scheduling

Master thesis (2026) - C. Attanasio, D. Kurowicka, J. Söhl, Alvaro Piedrafita Postigo

Stochastic scheduling plays a fundamental role in understanding how uncertainty propagates in machine logistics. However, existing research lacks explicit treatment of exact gradients for the project duration with respect to specific task parameters. This thesis develops a framework to calculate the sensitivity of the project duration distribution in stochastic directed acyclic graphs (DAGs).

Initially, closed-form analytical expressions for the gradients of the expected project duration are derived under Gaussian assumptions. The research then extends this foundation through a Generalized Sensitivity Theorem, accommodating a broader class of probability distributions and enabling the use of shared parameters across multiple nodes. Furthermore, the framework is applied to analyze internal machine logistics and component waiting times. By introducing parameterized artificial delays, the study formulates an optimization approach to approximate Just In Time (JIT) behavior within stochastic environments. ...

Model-Agnostic Prediction Density Methods

Master thesis (2026) - D.A. Tudor, D. Kurowicka, Bálint Négyesi, J. Söhl, Valerii Zoller

The present work focuses on constructing predictive densities, conditional on a set of features, for one-dimensional real-valued random variables. We approach the problem in a model-agnostic manner, aiming for methods that can be applied to arbitrary models without imposing parametric assumptions on the underlying distribution.
We first present the standard kernel density estimation approach and discuss its limitations. In this context, the conformal framework, originally developed for prediction intervals, is particularly appealing due to its finite-sample marginal validity guarantees. We then introduce conformal predictive distributions, a recent development in the literature that ensures the associated predictive system is marginally Unif[0,1] under the Probability Integral Transform (PIT).
However, these distributions tend to be highly fuzzy. As a result, directly applying finite differencing to conformal predictive distributions produces noisy predictive densities that may obscure important underlying features.
To address this issue, we propose two solutions. First, Gaussian filtering yields the smoothest densities and empirically maintains PIT perturbations within a satisfactory range, although no theoretical bounds on the perturbation are derived. Second, we introduce a new method, termed quantile-matching, which produces less fuzzy densities while providing a sharp theoretical upper bound on PIT perturbation.
Furthermore, we show that when the number of quantiles is allowed to equal the size of the calibration set, the distribution induced by quantile-matching coincides with the crisp modification of conformal predictive distributions, thereby yielding an upper bound on their PIT perturbation as well.
Finally, we evaluate the proposed methods on a large simulated real estate transactions dataset based on the Hierarchical Trend Model. Our results indicate that the quantile-matching approach outperforms competing methods across several metrics, including the Mean Absolute Error of the associated tail means and running time per transaction price. ...

The present work focuses on constructing predictive densities, conditional on a set of features, for one-dimensional real-valued random variables. We approach the problem in a model-agnostic manner, aiming for methods that can be applied to arbitrary models without imposing parametric assumptions on the underlying distribution.
We first present the standard kernel density estimation approach and discuss its limitations. In this context, the conformal framework, originally developed for prediction intervals, is particularly appealing due to its finite-sample marginal validity guarantees. We then introduce conformal predictive distributions, a recent development in the literature that ensures the associated predictive system is marginally Unif[0,1] under the Probability Integral Transform (PIT).
However, these distributions tend to be highly fuzzy. As a result, directly applying finite differencing to conformal predictive distributions produces noisy predictive densities that may obscure important underlying features.
To address this issue, we propose two solutions. First, Gaussian filtering yields the smoothest densities and empirically maintains PIT perturbations within a satisfactory range, although no theoretical bounds on the perturbation are derived. Second, we introduce a new method, termed quantile-matching, which produces less fuzzy densities while providing a sharp theoretical upper bound on PIT perturbation.
Furthermore, we show that when the number of quantiles is allowed to equal the size of the calibration set, the distribution induced by quantile-matching coincides with the crisp modification of conformal predictive distributions, thereby yielding an upper bound on their PIT perturbation as well.
Finally, we evaluate the proposed methods on a large simulated real estate transactions dataset based on the Hierarchical Trend Model. Our results indicate that the quantile-matching approach outperforms competing methods across several metrics, including the Mean Absolute Error of the associated tail means and running time per transaction price.

Generating random correlation matrices with constraints

Bachelor thesis (2025) - I.H. van der Brug, D. Kurowicka, N. Parolya

Correlation matrices play a central role in multivariate modelling across fields such as finance and statistics. However, generating valid correlation matrices, remains a non-trivial problem due to the global positive definiteness condition they must satisfy. This thesis investigates two methods for generating correlation matrices, with extensions on how to control or influence the average correlation. The first method relies on square root decomposition of the correlation matrix, parametrizing it as the product of a matrix with unit-norm rows and its transpose. A recent extension of this by Tuitman et al. is explored, which enables the generation of matrices with a fixed average correlation. This is achieved through iterative construction of the decomposition, ensuring the weighted sum of vectors has a prescribed norm, corresponding to the target average correlation. The algorithms geometric structure, feasibility conditions, and statistical properties are analysed.
The second method is based on the C-vine construction using partial correlations, as introduced by Joe and Kurowicka. There exists a one-to-one mapping from a set of partial correlations to a full correlation matrix. This approach parametrizes the matrix through a structured sequence of partial correlations. The distribution from which these partial correlations are sampled can be adjusted to achieve specific properties in the resulting matrices, for example using specific Beta distributions we obtain matrices following the LKJ distribution. The extension by Joe and Kurowicka is investigated, which allows the expected value of each correlation to be fixed across samples.
A comparison of both methods is provided in terms of construction, flexibility, numerical stability, and statistical properties of the resulting matrices. While the square root decomposition method offers strict per-matrix control over the average correlation, the C-vine approach provides greater flexibility, enabling finer control over marginal distributions. The thesis concludes with a discussion on practical trade-offs and potential directions for future work.
...

Correlation matrices play a central role in multivariate modelling across fields such as finance and statistics. However, generating valid correlation matrices, remains a non-trivial problem due to the global positive definiteness condition they must satisfy. This thesis investigates two methods for generating correlation matrices, with extensions on how to control or influence the average correlation. The first method relies on square root decomposition of the correlation matrix, parametrizing it as the product of a matrix with unit-norm rows and its transpose. A recent extension of this by Tuitman et al. is explored, which enables the generation of matrices with a fixed average correlation. This is achieved through iterative construction of the decomposition, ensuring the weighted sum of vectors has a prescribed norm, corresponding to the target average correlation. The algorithms geometric structure, feasibility conditions, and statistical properties are analysed.
The second method is based on the C-vine construction using partial correlations, as introduced by Joe and Kurowicka. There exists a one-to-one mapping from a set of partial correlations to a full correlation matrix. This approach parametrizes the matrix through a structured sequence of partial correlations. The distribution from which these partial correlations are sampled can be adjusted to achieve specific properties in the resulting matrices, for example using specific Beta distributions we obtain matrices following the LKJ distribution. The extension by Joe and Kurowicka is investigated, which allows the expected value of each correlation to be fixed across samples.
A comparison of both methods is provided in terms of construction, flexibility, numerical stability, and statistical properties of the resulting matrices. While the square root decomposition method offers strict per-matrix control over the average correlation, the C-vine approach provides greater flexibility, enabling finer control over marginal distributions. The thesis concludes with a discussion on practical trade-offs and potential directions for future work.

Number of Directed Acyclic Graphs with Extra Constraints

Bachelor thesis (2025) - M.T. Schuurman, D. Kurowicka, A.F.F. Derumigny

This report investigates the enumeration of labeled directed acyclic graphs (DAGs) under various structural constraints, extending an inclusion–exclusion recurrence introduced by R.W. Robinson. Starting from the enumeration of general DAGs via out-point partitioning, the recursive method is adapted to count more specialized classes such as DAGs with a fixed number of arcs or out-points. The Robinson method is also applied to create a formula for the enumeration of rooted directed trees, polytrees, and a special triangle structure. For each of these constrained graph classes, both the closed-form expressions and the derived recursive enumeration formulas are explained, showcasing the versatility and mathematical elegance of Robinson’s technique. A main focus of this report is the derivation and interpretation of the Robinson recurrence that adjusts the choose-attach model while using the local attachment rules. This report also presents visual illustrations of the original Robinson method and its adaptations. The results of these formulas are shown in graphs and tables to show the exponential growth of each class. The results bring together several enumeration problems in graph theory under a single recursive framework, offering both theoretical insight and practical enumeration formulas. The contributions of this thesis provide an analysis of DAG enumeration problems, bridging theoretical insights and practical relevance. These results have important implications for a wide range of applications, including Bayesian networks, causal inference modeling, scheduling, and network design. Finally, promising directions for future research are mentioned, emphasizing potential extensions to more complex graph structures and advanced enumeration methodologies. ...

A Vine Copula Approach for Portfolio Optimisation

Exploring the Effect of Copulas and Vine Models on Optimal Investment Allocation of Stock Index Returns

Master thesis (2024) - J. godard, D. Kurowicka, A.F.F. Derumigny

This thesis explores the growing complexity of contemporary financial markets, which is a consequence of a world that is increasingly interconnected and correlated. This evolution highlights the necessity of understanding and accurately modeling these underlying relationships, which translates into the need of incorporating more complex models into portfolio optimization, breaking away from Harry Markowitz’s foundational Portfolio Optimization Theory. While Markowitz’s model has been effective, the complexity of modern financial instruments demands more sophisticated approaches. This study focuses on the application of copulas and vine models to portfolio optimization, aiming to understand how these advanced models can enhance the optimization process by accurately capturing dependencies among financial assets. In particular, this thesis investigates the benefits of integrating copula-GARCH models, a combination of time series modelling where the residuals are modelled using copulas or vine models, into portfolio theory. Through this approach, the research aims to extend existing knowledge and highlight the specific advantages provided by these models in portfolio optimization. ...

Symmetries in Gaussian Graphical Models

Bachelor thesis (2024) - D. Wolff, D. Kurowicka, N. Parolya

This thesis explores the application of Gaussian Graphical Models (GGMs) with a specific focus on identifying symmetries within EEG data. A significant challenge in using GGMs is achieving accurate estimations with limited observations, which is common in medical data. This research proposes a methodology for detecting and integrating symmetric structures in GGMs, thereby reducing the number of parameters and improving model interpretability. We follow developments presented by Højsgaard en Lauritzen (2008).
The study includes a detailed explanation of the multivariate Gaussian distribution, Maximum Likelihood Estimation (MLE), and the implementation of Iterative Partial Maximization for parameter estimation. Additionally, hierarchical clustering is employed to systematically identify symmetry classes. Results indicate that incorporating symmetry constraints enhances the accuracy and interpretability of GGMs.
The research shows that symmetry constraints simplify models, making them more robust and easier to understand.
A simulation study was conducted to test the efficiency and accuracy of the developed algorithm for symmetry detection. The findings from these simulations validate the proposed approach, demonstrating significant improvements in model performance.
Finally, the methodology was applied to an EEG dataset, highlighting practical applications in neuroscience. The results from the EEG data analysis further confirm that symmetry constraints can reveal underlying patterns in brain connectivity, offering valuable insights into the neural dynamics.
This thesis contributes to the existing literature by providing a systematic approach to detect symmetries in high-dimensional data models, particularly its practical utility with real-world EEG data.
...

Inference Methods in Pair Copula Bayesian Networks

Master thesis (2024) - P. Basiouras Serrano, D. Kurowicka, A.F.F. Derumigny

The aim of this thesis is the study of inference problems in Pair Copula Bayesian Networks (PCBN). To this end, certain sub-structures called arteries are identified in the PCBNs and Arterial Sample Propagation, a sample-based extension of Pearl's Belief Propagation Algorithm, is developed for single arteries. This proposed inference methodology incorporates properties unique in PCBNs as well as information on the graph structure, thus avoiding unnecessary computations and boosting the algorithm's performance. Furthermore, an extension of Arterial Sample Propagation is proposed for PCBNs with multiple arteries under some additional assumptions on the graph structure.

This thesis also explores the structural properties of PCBNs, with this examination moving in two separate directions. On the one hand we analyze inference problem reduction through pruning, building up to a pruning algorithm for PCBNs that removes a larger number of variables than existing BN pruning methods. Additionally, we study the implications that the existence of arcs in arteries have on the PCBN's structure. We prove a Theorem used as a background for Arterial Sample Propagation algorithms developed in this thesis, which has potential applications in the development stage of PCBNs. ...

Site-Response Models for Induced Earthquekes in Groningen

Master thesis (2024) - M. Delboo, D. Kurowicka, Ö. Şahin, A.F.F. Derumigny, Dirk Kraaijpoel

This thesis investigates the Ground Motion Model (GMM) for the Groningen Seismic Hazard & Risk Analysis. We look at various aspects of the model to see where improvements can be made. We start by looking at model calibration and validation, where we check to what extent the proposed model along with its parameter fits the data. Here we come to the conclusion that both the parameters and the model itself have room for further optimization to reflect the data set more accurately. In addition to this, a new model for correlations of site-response amplifications is presented. The topic of how the dependence of these quantities should be modeled is still unclear. The lack of coherent solution to this modeling problem makes the proposed model valuable, as it is simple in nature and reflects the data we have the best.
Lastly, the main improvements that come out of this research are in on the computational front, by finding a novel method for calculating average spectral accelerations, which is the main quantity used in risk assessment. This method generates the distribution of this quantity in one step using numerical integration rather than the previously used Monte Carlo method. The method speeds up computations by a factor of 550 times. ...

Predicting the Swap Spread with a Dynamic Nelson-Siegel Model

A Novel Approach to Predict the Spread between Two Correlated Interest Rates

Master thesis (2023) - J.V. Swanenburg, D. Kurowicka

This thesis aims to develop a methodology for predicting the swap spread, which is defined as the difference between the German government bond interest rate and the Euribor swap rate. Thus far, the prediction of interest rates is limited to the prediction of a single interest rate. This thesis introduces the simultaneous prediction of the spread between two correlated interest rate curves. The methodology developed in this thesis considers the dependence between the bond rate and the swap rate. The study utilizes a Dynamic Nelson-Siegel (DNS) model, which is extended to incorporate the correlation between these two rates. The simulation studies reveal that the variants simultaneously predicting both the swap and bond rates using a restricted VAR(1) model for factor dynamics outperform the other variants in predicting the swap spread. Another important aspect considered is the stationarity of the latent factors. The simulation studies demonstrate that the stationarity of the empirical DNS factors accurately represents the stationarity of the true DNS factors. This motivates the reformulation of the DNS model into a new variant where the first-order differences of both the swap and bond rate latent factors are modeled by a restricted VAR(1) model.
A case study validates the developed new variant of the DNS model, demonstrating predictions for the swap and bond curves that have an accuracy comparable to the accuracy of the benchmark model. The key advantage of the DNS model over this benchmark model is that the DNS model predicts the swap curve and bond curve over the whole maturity spectrum. The prediction over the whole maturity spectrum is crucial to compute the spread between the two rates, which emphasizes the relevance of the new model presented in this thesis. ...

On the restrictions of Pair-Copula Bayesian Networks for integration-free computations

Master thesis (2023) - N.J. Horsman, D. Kurowicka, A.F.F. Derumigny

The pair-copula Bayesian network (PCBN) is a Bayesian network (BN) where the conditional probability functions are modeled using pair-copula constructions. By assigning bivariate conditional copulas to the arcs of the BN, one finds a proper joint density which can flexibly model all kinds of dependence structures. It is a known problem that the PCBN may require numerical integration to perform computations such as sampling and likelihood-inference. To address this issue we propose novel restrictions on the graphical structure and assignment of copulas such that integration will not be required. The resulting restricted PCBN offers significant computational benefits. We establish how to estimate and conduct a structure search for the restricted PCBN. A simulation study shows that a restricted PCBN is able to model non-Gaussian dependence structures more accurately than the widely used Gaussian Bayesian network. ...

Energy Efficiency Valuation

Estimating the increase in expected transaction price due to improved energy efficiency in the Dutch housing market

Master thesis (2022) - M.L. Groot Beumer, D. Kurowicka, N. Parolya

Recent advancements in causal inference and machine learning research have brought forward methods to estimate effects of interventions from observational data. The augmented inverse probability weighted (AIPW) estimator is such a method, which can be used to obtain estimates of potential outcomes. Potential outcomes are defined as a hypothetical outcome pair {Y⁽¹⁾,Y⁽⁰⁾}, of which only one outcome is observed in the data. Estimation of intervention effects boils down to effectively estimating these potential outcomes.
Using the AIPW estimator, we aim to evaluate the average effect of increasing the energy efficiency of houses in the Netherlands on their expected transaction price. Moreover, we investigate how this expected effect changes when we condition on a certain subset of the data.
Given that our assumptions hold, we find that on average, the estimated expected increase in transaction price is positive when improving the energy efficiency of a house. Improving an energy inefficient house to moderately energy efficient is expected to increase the transaction price by approximately €97.70,- per m², while the improvement from moderately energy efficient to energy efficient increases the expected transaction price by approximately €20.96 per m². In general, older, smaller and more energy inefficient houses increase most in expected transaction price per m²when their energy efficiency is improved.
...

Introduction to structure learning for gaussian and pair copula bayesian networks

Master thesis (2022) - A. Villar Guardia, D. Kurowicka

Due to technological breakthrough in recent decades and the rapid increase in the availability of multidimensional data, data science has become one of the most important areas of research. Within this field, modeling dependence of random variables is gaining great interest. To cope with this task, the use of graphical models is often advocated. In this dissertation, we study Bayesian Networks (BNs), a particular type of graphical models. Concretely, structure learning algorithms for two types of continuous BNs: Gaussian Bayesian Networks (GBNs) and Pair Copula Bayesian Network (PCBNs) are investigated.

We present an overview of these two types of BNs, illustrating its properties and differences. An outline of the different existing structure learning algorithms is provided, showing their efficiency for the Gaussian case and limitations for the copula based. The problems of structure learning for PCBNs are then addressed. We investigate the performance of Gaussian structure learning algorithms for PCBNs. Based on a simulation study, we show that these procedures are not completely efficient, but prove beneficial. Second, a new approximation of the score based on logLikelihood of PCBNs is explored. We propose to solve the computational inefficiency of the exact logLikelihood by estimating the necessary copulas from data such that the copula terms in the PCBNs decomposition can be computed without need of integration. A simulation study suggests that this logLikelihood approximation yields better results than the approximation used
by Pircalabelu et al. (2017). Finally, an algorithm to learn the structure of PCBNs is proposed, based on the 2 previous procedures. ...

Threshold tuning of transaction monitoring models

A risk-based approach to combat money laundering

Master thesis (2022) - S. Vis, D. Kurowicka, K.S. Postek, J. Goudsmit, W. van Willigen

Money laundering is an increasing problem for the global economy. To combat money laundering, banks use transaction monitoring models with particular thresholds to detect unusual transaction behaviour. However, it is a challenge to determine and evaluate the suitability of a threshold level to ensure that the risk of misclassification of transactions falls within the bank’s risk appetite. In the threshold tuning process, the suitability of a threshold level can be evaluated with a sample of the transactions below or above a threshold level which are reviewed by an analyst. One problem is that the review process of transactions during the threshold tuning process is timeconsuming. In addition, banks want to be able to quantify the risk of misclassification of transactions to determine whether this falls within their risk appetite. This underlines the need to develop a threshold tuning strategy to accelerate the threshold tuning process in which the risk of misclassification of transactions can be quantified to determine whether it falls within the bank’s risk appetite. To accelerate the threshold tuning process, a framework was developed and five threshold tuning strategies were established which evaluate the suitability of different threshold levels with a given strategy. In addition, several methods to determine a confidence interval were examined to quantify the risk of misclassification and to ensure that it falls within the bank’s risk appetite. The threshold tuning strategies were compared and evaluated on the required amount of reviews of transactions and the difference between the found and true threshold level using synthetic data sets. Overall, the bisection threshold tuning strategy is recommended, since this strategy resulted in the lowest number of required reviews of transactions and resulted in a small difference between the found and true threshold level. The results of the synthetic data sets were promising, but more experiments with preferably real transaction data or other distributions are required to further evolve and fully validate the framework and proposed bisection strategy. The work presented in this thesis contributed to a more risk-based approach to enhance the efficiency and effectiveness of the threshold tuning process of transaction monitoring models. ...

Money laundering is an increasing problem for the global economy. To combat money laundering, banks use transaction monitoring models with particular thresholds to detect unusual transaction behaviour. However, it is a challenge to determine and evaluate the suitability of a threshold level to ensure that the risk of misclassification of transactions falls within the bank’s risk appetite. In the threshold tuning process, the suitability of a threshold level can be evaluated with a sample of the transactions below or above a threshold level which are reviewed by an analyst. One problem is that the review process of transactions during the threshold tuning process is timeconsuming. In addition, banks want to be able to quantify the risk of misclassification of transactions to determine whether this falls within their risk appetite. This underlines the need to develop a threshold tuning strategy to accelerate the threshold tuning process in which the risk of misclassification of transactions can be quantified to determine whether it falls within the bank’s risk appetite. To accelerate the threshold tuning process, a framework was developed and five threshold tuning strategies were established which evaluate the suitability of different threshold levels with a given strategy. In addition, several methods to determine a confidence interval were examined to quantify the risk of misclassification and to ensure that it falls within the bank’s risk appetite. The threshold tuning strategies were compared and evaluated on the required amount of reviews of transactions and the difference between the found and true threshold level using synthetic data sets. Overall, the bisection threshold tuning strategy is recommended, since this strategy resulted in the lowest number of required reviews of transactions and resulted in a small difference between the found and true threshold level. The results of the synthetic data sets were promising, but more experiments with preferably real transaction data or other distributions are required to further evolve and fully validate the framework and proposed bisection strategy. The work presented in this thesis contributed to a more risk-based approach to enhance the efficiency and effectiveness of the threshold tuning process of transaction monitoring models.

A probabilistic framework for the quantification of vegetation effects on the failure mechanisms of Dutch river dikes

Master thesis (2021) - L.M. Wopereis, J.P. Aguilar Lopez, R.C. Lanzafame, D. Kurowicka, Ellis Penning

River floods are becoming increasingly devastating because of climate change (more frequent and extreme rainfall), population growth and the increasing economic importance of river basins. This situation requires maintenance and strengthening of flood-defence systems.

Adding certain types of vegetation at precise locations for their positive impact may be a cheaper, more flexible, and more environment-friendly way to strengthen dikes than the traditional increase in height. However, this nature-based (NB) option is not yet widely implemented due to the lack of precise knowledge of the potential of vegetation effects and their uncertainty.

This study uses a probabilistic method to better understand the effects of vegetation by including vegetation in the computation of the failure probabilities of Dutch river dikes. A framework was established to combine all these vegetation effects simultaneously in the computation of the total failure probability, considering different magnitudes of each effect. This enables the consideration of a wide range of vegetation scenarios, from which conclusions were drawn.

Overall, this thesis provides a useful and versatile tool for assessing the influence of vegetation on dikes that has a lot of potential and can be easily enhanced in the future. ...

The Lamperti Transform

Applications to Stochastic Local Volatility Models

Master thesis (2020) - S.G. de Boer, C.W. Oosterlee, D. Kurowicka, L.A. Grzelak, P. Chebolu

This thesis showcases a rather contemporary method of solving a generalized system of stochastic differential equations (SDE's) comparable to the SABR model. The solution is derived from a stochastic-local volatility (SLV) model in which the local volatility (LV) component is kept general. This generality is maintained throughout all derivations, eventually yielding a model containing an undefined LV function. This function can then be specified however the user of the model deems suitable, as long as minor constraints are satisfied. Obviously, this is a very valuable quality of the model as it is highly customizable. The solution consists of a set of pricing functions that seemingly possess all the aforementioned desirable properties, i.e. fast in evaluation, computational tractability, flexibility etc., with little disadvantages. The generalized SLV model that is used is typically denoted in the form of two SDE's, though in the majority of this thesis an atypical three SDE form is used. This extended system is used to isolate the LV component, in turn enabling for appropriate application of an SDE transformation called the Lamperti transform, which will provide the key to solving the entire system. The Lamperti transform is a highly versatile method for transforming SDE's into new equations typically more suitable for simulation and parameter estimation procedures, and its inner-workings and various applications will be the main focus of this thesis. ...

Modelling finite mixture joint distributions

Bachelor thesis (2020) - B.J. Bakker, D. Kurowicka

Building models for an electric vehicles usage data set in order to simulate virtual populations of these electric vehicle users. With these simulated populations charging strategies for electric vehicle users could be developed in order to improve the use of electric vehicles. The modeling is done with copula functions, and are compared with different tests. ...

Performance of the copula-based Morris Method

Bachelor thesis (2019) - Rutger van Beek, Dorota Kurowicka, Jeroen Spandaw, Ghada El Serafy

The Morris method is a widely used screening method in sensitivity analysis. The method assumes that the input parameters are independent of each other. To overcome the assumption a copula-based Morris method is proposed. In this report the results of taking the dependencies into account are analyzed for the Morris method. For two examples sensitivity analysis is performed with the Morris method, with copula-based Morris method and by calculating sample correlations with a Monte Carlo simulation. From the analysis it follows that taking dependencies into account can have varying effects for different methods. It turns out that a straight-forward implementation makes the method often practically unusable. The sampling of model evaluation points becomes too computer expensive. The amount of copula evaluations is growing exponentially with the dimension and for copulas without an analytic expression these are already lengthy. The computational intensity can be reduced in two ways. First, one can approximate the probabilities. Different ways of approximating the probabilities are researched. Numerically integrating with the midpoint rule seems to be the best way of approximating the probabilities in the copula-based Morris method. Next to approximating the probabilities, one can also use the independent groups when implementing the method. When the input parameters are correlated there are usually a few groups of correlated parameters rather than that all the parameters are correlated with each other. This can be utilized to more efficiently implement the copula-based Morris method. When the group sizes are not increasing the computational intensity depends linearly instead of exponentially on the number of model parameters. By using both improvements the method can generally be applied to tens or hundreds of parameters in reasonable time, which is desired for a screening method. ...

Can we predict the Eredivisie?

Predicting rankings in football using the Bradley-Terry model

Bachelor thesis (2019) - Cor-Jan Heijlema, Dorota Kurowicka

Bayesian Variable Selection in Probability of Default Models

Master thesis (2019) - Koen Carmiggelt, Dorota Kurowicka, Berend Ritzema, Jakob Söhl

Banks are financial institutions that lend money from other parties and provide loans to individuals and organisation for a higher interest. Lending out money is associated with the risk that debtors are not able to fully or partially repay the loans. This is called credit risk. Banks have to make an estimate of the credit risk in their portfolios and have to keep reserves for potential losses. The way this risk is to be determined, is decided by the government where the bank is established. In Europe, the United States, Russia, China among others, the legislation on credit risk is derived from Basel III. Basel III is an international framework to homogenise banking regulation across the world. There are three important factors to determine credit risk In Basel III, namely Probability of Default, Loss Given Default and Exposure at Default. In this thesis I investigate Probability of Default (PD) modelling. The size of the portfolio, for which the Probability of Default has to be estimated, can vary greatly. When the amount of defaults in a portfolio is low and the amount of explanatory variables is high, there is a risk of overfitting. Variable selection methods can be used to counteract overfitting and give understanding of the important predictors. I apply variable selection methods on on a logistic regression. I look at three Frequentist variable selection methods, namely Forward Selection, Lasso and Relaxed Lasso. I compare these three methods with Predictive Projection combined with a Horseshoe prior, which is a Bayesian approach to variable selection. Forward Selection starts with only the intercept in the model and adds variables one by one to the model. The variables are added in such a way that each step increase the estimated performance the most. The Horseshoe prior and Lasso Regression are types of regularisation, where the estimates of the regression coefficients of the logistic regression get shrunk to zero. In Lasso regression, this is done by adding a L1 penalty of the regression coefficients to the logistic regression. This causes weak signals to be pulled to zero. Lasso shrinks all regression coefficients to zero to some degree, even those with a strong signal.
Lasso can also be used to and an order of importance for the regression coefficients by varying the strength of the L1 penalty. Regression coefficients are set to zero one-by-one as the penalty increases. Relaxed Lasso uses this rank and refits the variables without regularisation.
In Bayesian statics, regularisation is added via the prior. The Horseshoe prior can adjust to the average sparsity in the model and the Horseshoe prior either shrinks a signal aggressively to zero, or leaves the signal almost unchanged. The posterior of the model is never truly sparse. Predictive Projection can induce sparsity by setting the Monte Carlo samples of the posterior to zero for certain variables. This is done in such a way that the Kullback-Leibler divergence between the full posterior and the projected sparser posterior is minimised. I investigate the behaviour of the variable selection methods. The main focus is on the predictive performance, the sparsity, the computation time and the reliability of the estimated performance for the selected models. I apply the methods to various types of simulated data to compare the variable selection methods. The simulated data consist of data with independent predictors, collinear predictors and non-normal predictors, among others. The simulations studies show that Lasso and Predictive Projection lead to models with the highest performance overall and the predictive performance is more stable over different realisation of the data. For the same performance the Predictive Projection produces models with less variables. This makes Predictive Projection the most attractive method. I also employ the techniques to FreddieMac data, which is a data set on single-family mortgages. The results are similar to the simulated data and Predictive Projection with the Horseshoe prior is the most attractive variable selection method. Both the simulation studies and the FreddieMac application imply that the estimated performance of the Predictive Projection and Lasso are better than those of Forward Selection and Relaxed Lasso. However, the behaviour of the estimated performance remain unclear to a certain degree. More simulations per data type and more data types are needed for more insight into the estimated performance. Additional resources are needed to achieve this. ...

Banks are financial institutions that lend money from other parties and provide loans to individuals and organisation for a higher interest. Lending out money is associated with the risk that debtors are not able to fully or partially repay the loans. This is called credit risk. Banks have to make an estimate of the credit risk in their portfolios and have to keep reserves for potential losses. The way this risk is to be determined, is decided by the government where the bank is established. In Europe, the United States, Russia, China among others, the legislation on credit risk is derived from Basel III. Basel III is an international framework to homogenise banking regulation across the world. There are three important factors to determine credit risk In Basel III, namely Probability of Default, Loss Given Default and Exposure at Default. In this thesis I investigate Probability of Default (PD) modelling. The size of the portfolio, for which the Probability of Default has to be estimated, can vary greatly. When the amount of defaults in a portfolio is low and the amount of explanatory variables is high, there is a risk of overfitting. Variable selection methods can be used to counteract overfitting and give understanding of the important predictors. I apply variable selection methods on on a logistic regression. I look at three Frequentist variable selection methods, namely Forward Selection, Lasso and Relaxed Lasso. I compare these three methods with Predictive Projection combined with a Horseshoe prior, which is a Bayesian approach to variable selection. Forward Selection starts with only the intercept in the model and adds variables one by one to the model. The variables are added in such a way that each step increase the estimated performance the most. The Horseshoe prior and Lasso Regression are types of regularisation, where the estimates of the regression coefficients of the logistic regression get shrunk to zero. In Lasso regression, this is done by adding a L1 penalty of the regression coefficients to the logistic regression. This causes weak signals to be pulled to zero. Lasso shrinks all regression coefficients to zero to some degree, even those with a strong signal.
Lasso can also be used to and an order of importance for the regression coefficients by varying the strength of the L1 penalty. Regression coefficients are set to zero one-by-one as the penalty increases. Relaxed Lasso uses this rank and refits the variables without regularisation.
In Bayesian statics, regularisation is added via the prior. The Horseshoe prior can adjust to the average sparsity in the model and the Horseshoe prior either shrinks a signal aggressively to zero, or leaves the signal almost unchanged. The posterior of the model is never truly sparse. Predictive Projection can induce sparsity by setting the Monte Carlo samples of the posterior to zero for certain variables. This is done in such a way that the Kullback-Leibler divergence between the full posterior and the projected sparser posterior is minimised. I investigate the behaviour of the variable selection methods. The main focus is on the predictive performance, the sparsity, the computation time and the reliability of the estimated performance for the selected models. I apply the methods to various types of simulated data to compare the variable selection methods. The simulated data consist of data with independent predictors, collinear predictors and non-normal predictors, among others. The simulations studies show that Lasso and Predictive Projection lead to models with the highest performance overall and the predictive performance is more stable over different realisation of the data. For the same performance the Predictive Projection produces models with less variables. This makes Predictive Projection the most attractive method. I also employ the techniques to FreddieMac data, which is a data set on single-family mortgages. The results are similar to the simulated data and Predictive Projection with the Horseshoe prior is the most attractive variable selection method. Both the simulation studies and the FreddieMac application imply that the estimated performance of the Predictive Projection and Lasso are better than those of Forward Selection and Relaxed Lasso. However, the behaviour of the estimated performance remain unclear to a certain degree. More simulations per data type and more data types are needed for more insight into the estimated performance. Additional resources are needed to achieve this.

Characterizations of Multivariate Tail Dependence

On theory and inference to assess extremal dependence structures

Master thesis (2018) - Carina van der Zee, Dorota Kurowicka, Juanjuan Cai

This thesis gathers, develops and evaluates several characterizations of multivariate tail dependence. It is established that the stable tail dependence function (STDF) is a suitable copula-based dependence function that fully captures the multivariate extremal dependence structure in all dimensions d≥2 and can be used to visualize the tail dependence structure for bivariate and trivariate problems. Based on the STDF, we propose a multivariate tail dependence coefficient (TDC) as an extension of the well-known bivariate TDC. Importantly, we show that the proposed measure can identify tail independence in all dimensions d≥2, similar to its bivariate variant. The performance of nonparametric estimators for the STDF and, inherently, the multivariate TDC, is assessed with an extensive simulation study, including smoothed and bias-corrected versions of the empirical STDF. Based on the estimators for the STDF and the multivariate TDC, test statistics under the null hypothesis of tail independence are developed and evaluated in another simulation study. The STDF-based estimation and testing procedures are applied to foreign exchange (FX) data to characterize the tail dependence structure between three European FX rates and five worldwide FX rates. ...