D. Kurowicka
Please Note
22 records found
1
Initially, closed-form analytical expressions for the gradients of the expected project duration are derived under Gaussian assumptions. The research then extends this foundation through a Generalized Sensitivity Theorem, accommodating a broader class of probability distributions and enabling the use of shared parameters across multiple nodes. Furthermore, the framework is applied to analyze internal machine logistics and component waiting times. By introducing parameterized artificial delays, the study formulates an optimization approach to approximate Just In Time (JIT) behavior within stochastic environments. ...
Initially, closed-form analytical expressions for the gradients of the expected project duration are derived under Gaussian assumptions. The research then extends this foundation through a Generalized Sensitivity Theorem, accommodating a broader class of probability distributions and enabling the use of shared parameters across multiple nodes. Furthermore, the framework is applied to analyze internal machine logistics and component waiting times. By introducing parameterized artificial delays, the study formulates an optimization approach to approximate Just In Time (JIT) behavior within stochastic environments.
We first present the standard kernel density estimation approach and discuss its limitations. In this context, the conformal framework, originally developed for prediction intervals, is particularly appealing due to its finite-sample marginal validity guarantees. We then introduce conformal predictive distributions, a recent development in the literature that ensures the associated predictive system is marginally Unif[0,1] under the Probability Integral Transform (PIT).
However, these distributions tend to be highly fuzzy. As a result, directly applying finite differencing to conformal predictive distributions produces noisy predictive densities that may obscure important underlying features.
To address this issue, we propose two solutions. First, Gaussian filtering yields the smoothest densities and empirically maintains PIT perturbations within a satisfactory range, although no theoretical bounds on the perturbation are derived. Second, we introduce a new method, termed quantile-matching, which produces less fuzzy densities while providing a sharp theoretical upper bound on PIT perturbation.
Furthermore, we show that when the number of quantiles is allowed to equal the size of the calibration set, the distribution induced by quantile-matching coincides with the crisp modification of conformal predictive distributions, thereby yielding an upper bound on their PIT perturbation as well.
Finally, we evaluate the proposed methods on a large simulated real estate transactions dataset based on the Hierarchical Trend Model. Our results indicate that the quantile-matching approach outperforms competing methods across several metrics, including the Mean Absolute Error of the associated tail means and running time per transaction price. ...
We first present the standard kernel density estimation approach and discuss its limitations. In this context, the conformal framework, originally developed for prediction intervals, is particularly appealing due to its finite-sample marginal validity guarantees. We then introduce conformal predictive distributions, a recent development in the literature that ensures the associated predictive system is marginally Unif[0,1] under the Probability Integral Transform (PIT).
However, these distributions tend to be highly fuzzy. As a result, directly applying finite differencing to conformal predictive distributions produces noisy predictive densities that may obscure important underlying features.
To address this issue, we propose two solutions. First, Gaussian filtering yields the smoothest densities and empirically maintains PIT perturbations within a satisfactory range, although no theoretical bounds on the perturbation are derived. Second, we introduce a new method, termed quantile-matching, which produces less fuzzy densities while providing a sharp theoretical upper bound on PIT perturbation.
Furthermore, we show that when the number of quantiles is allowed to equal the size of the calibration set, the distribution induced by quantile-matching coincides with the crisp modification of conformal predictive distributions, thereby yielding an upper bound on their PIT perturbation as well.
Finally, we evaluate the proposed methods on a large simulated real estate transactions dataset based on the Hierarchical Trend Model. Our results indicate that the quantile-matching approach outperforms competing methods across several metrics, including the Mean Absolute Error of the associated tail means and running time per transaction price.
The second method is based on the C-vine construction using partial correlations, as introduced by Joe and Kurowicka. There exists a one-to-one mapping from a set of partial correlations to a full correlation matrix. This approach parametrizes the matrix through a structured sequence of partial correlations. The distribution from which these partial correlations are sampled can be adjusted to achieve specific properties in the resulting matrices, for example using specific Beta distributions we obtain matrices following the LKJ distribution. The extension by Joe and Kurowicka is investigated, which allows the expected value of each correlation to be fixed across samples.
A comparison of both methods is provided in terms of construction, flexibility, numerical stability, and statistical properties of the resulting matrices. While the square root decomposition method offers strict per-matrix control over the average correlation, the C-vine approach provides greater flexibility, enabling finer control over marginal distributions. The thesis concludes with a discussion on practical trade-offs and potential directions for future work.
...
The second method is based on the C-vine construction using partial correlations, as introduced by Joe and Kurowicka. There exists a one-to-one mapping from a set of partial correlations to a full correlation matrix. This approach parametrizes the matrix through a structured sequence of partial correlations. The distribution from which these partial correlations are sampled can be adjusted to achieve specific properties in the resulting matrices, for example using specific Beta distributions we obtain matrices following the LKJ distribution. The extension by Joe and Kurowicka is investigated, which allows the expected value of each correlation to be fixed across samples.
A comparison of both methods is provided in terms of construction, flexibility, numerical stability, and statistical properties of the resulting matrices. While the square root decomposition method offers strict per-matrix control over the average correlation, the C-vine approach provides greater flexibility, enabling finer control over marginal distributions. The thesis concludes with a discussion on practical trade-offs and potential directions for future work.
A Vine Copula Approach for Portfolio Optimisation
Exploring the Effect of Copulas and Vine Models on Optimal Investment Allocation of Stock Index Returns
The study includes a detailed explanation of the multivariate Gaussian distribution, Maximum Likelihood Estimation (MLE), and the implementation of Iterative Partial Maximization for parameter estimation. Additionally, hierarchical clustering is employed to systematically identify symmetry classes. Results indicate that incorporating symmetry constraints enhances the accuracy and interpretability of GGMs.
The research shows that symmetry constraints simplify models, making them more robust and easier to understand.
A simulation study was conducted to test the efficiency and accuracy of the developed algorithm for symmetry detection. The findings from these simulations validate the proposed approach, demonstrating significant improvements in model performance.
Finally, the methodology was applied to an EEG dataset, highlighting practical applications in neuroscience. The results from the EEG data analysis further confirm that symmetry constraints can reveal underlying patterns in brain connectivity, offering valuable insights into the neural dynamics.
This thesis contributes to the existing literature by providing a systematic approach to detect symmetries in high-dimensional data models, particularly its practical utility with real-world EEG data.
...
The study includes a detailed explanation of the multivariate Gaussian distribution, Maximum Likelihood Estimation (MLE), and the implementation of Iterative Partial Maximization for parameter estimation. Additionally, hierarchical clustering is employed to systematically identify symmetry classes. Results indicate that incorporating symmetry constraints enhances the accuracy and interpretability of GGMs.
The research shows that symmetry constraints simplify models, making them more robust and easier to understand.
A simulation study was conducted to test the efficiency and accuracy of the developed algorithm for symmetry detection. The findings from these simulations validate the proposed approach, demonstrating significant improvements in model performance.
Finally, the methodology was applied to an EEG dataset, highlighting practical applications in neuroscience. The results from the EEG data analysis further confirm that symmetry constraints can reveal underlying patterns in brain connectivity, offering valuable insights into the neural dynamics.
This thesis contributes to the existing literature by providing a systematic approach to detect symmetries in high-dimensional data models, particularly its practical utility with real-world EEG data.
This thesis also explores the structural properties of PCBNs, with this examination moving in two separate directions. On the one hand we analyze inference problem reduction through pruning, building up to a pruning algorithm for PCBNs that removes a larger number of variables than existing BN pruning methods. Additionally, we study the implications that the existence of arcs in arteries have on the PCBN's structure. We prove a Theorem used as a background for Arterial Sample Propagation algorithms developed in this thesis, which has potential applications in the development stage of PCBNs. ...
This thesis also explores the structural properties of PCBNs, with this examination moving in two separate directions. On the one hand we analyze inference problem reduction through pruning, building up to a pruning algorithm for PCBNs that removes a larger number of variables than existing BN pruning methods. Additionally, we study the implications that the existence of arcs in arteries have on the PCBN's structure. We prove a Theorem used as a background for Arterial Sample Propagation algorithms developed in this thesis, which has potential applications in the development stage of PCBNs.
Lastly, the main improvements that come out of this research are in on the computational front, by finding a novel method for calculating average spectral accelerations, which is the main quantity used in risk assessment. This method generates the distribution of this quantity in one step using numerical integration rather than the previously used Monte Carlo method. The method speeds up computations by a factor of 550 times. ...
Lastly, the main improvements that come out of this research are in on the computational front, by finding a novel method for calculating average spectral accelerations, which is the main quantity used in risk assessment. This method generates the distribution of this quantity in one step using numerical integration rather than the previously used Monte Carlo method. The method speeds up computations by a factor of 550 times.
Predicting the Swap Spread with a Dynamic Nelson-Siegel Model
A Novel Approach to Predict the Spread between Two Correlated Interest Rates
A case study validates the developed new variant of the DNS model, demonstrating predictions for the swap and bond curves that have an accuracy comparable to the accuracy of the benchmark model. The key advantage of the DNS model over this benchmark model is that the DNS model predicts the swap curve and bond curve over the whole maturity spectrum. The prediction over the whole maturity spectrum is crucial to compute the spread between the two rates, which emphasizes the relevance of the new model presented in this thesis. ...
A case study validates the developed new variant of the DNS model, demonstrating predictions for the swap and bond curves that have an accuracy comparable to the accuracy of the benchmark model. The key advantage of the DNS model over this benchmark model is that the DNS model predicts the swap curve and bond curve over the whole maturity spectrum. The prediction over the whole maturity spectrum is crucial to compute the spread between the two rates, which emphasizes the relevance of the new model presented in this thesis.
Energy Efficiency Valuation
Estimating the increase in expected transaction price due to improved energy efficiency in the Dutch housing market
Using the AIPW estimator, we aim to evaluate the average effect of increasing the energy efficiency of houses in the Netherlands on their expected transaction price. Moreover, we investigate how this expected effect changes when we condition on a certain subset of the data.
Given that our assumptions hold, we find that on average, the estimated expected increase in transaction price is positive when improving the energy efficiency of a house. Improving an energy inefficient house to moderately energy efficient is expected to increase the transaction price by approximately €97.70,- per m2, while the improvement from moderately energy efficient to energy efficient increases the expected transaction price by approximately €20.96 per m2. In general, older, smaller and more energy inefficient houses increase most in expected transaction price per m2 when their energy efficiency is improved.
...
Using the AIPW estimator, we aim to evaluate the average effect of increasing the energy efficiency of houses in the Netherlands on their expected transaction price. Moreover, we investigate how this expected effect changes when we condition on a certain subset of the data.
Given that our assumptions hold, we find that on average, the estimated expected increase in transaction price is positive when improving the energy efficiency of a house. Improving an energy inefficient house to moderately energy efficient is expected to increase the transaction price by approximately €97.70,- per m2, while the improvement from moderately energy efficient to energy efficient increases the expected transaction price by approximately €20.96 per m2. In general, older, smaller and more energy inefficient houses increase most in expected transaction price per m2 when their energy efficiency is improved.
We present an overview of these two types of BNs, illustrating its properties and differences. An outline of the different existing structure learning algorithms is provided, showing their efficiency for the Gaussian case and limitations for the copula based. The problems of structure learning for PCBNs are then addressed. We investigate the performance of Gaussian structure learning algorithms for PCBNs. Based on a simulation study, we show that these procedures are not completely efficient, but prove beneficial. Second, a new approximation of the score based on logLikelihood of PCBNs is explored. We propose to solve the computational inefficiency of the exact logLikelihood by estimating the necessary copulas from data such that the copula terms in the PCBNs decomposition can be computed without need of integration. A simulation study suggests that this logLikelihood approximation yields better results than the approximation used
by Pircalabelu et al. (2017). Finally, an algorithm to learn the structure of PCBNs is proposed, based on the 2 previous procedures. ...
We present an overview of these two types of BNs, illustrating its properties and differences. An outline of the different existing structure learning algorithms is provided, showing their efficiency for the Gaussian case and limitations for the copula based. The problems of structure learning for PCBNs are then addressed. We investigate the performance of Gaussian structure learning algorithms for PCBNs. Based on a simulation study, we show that these procedures are not completely efficient, but prove beneficial. Second, a new approximation of the score based on logLikelihood of PCBNs is explored. We propose to solve the computational inefficiency of the exact logLikelihood by estimating the necessary copulas from data such that the copula terms in the PCBNs decomposition can be computed without need of integration. A simulation study suggests that this logLikelihood approximation yields better results than the approximation used
by Pircalabelu et al. (2017). Finally, an algorithm to learn the structure of PCBNs is proposed, based on the 2 previous procedures.
Threshold tuning of transaction monitoring models
A risk-based approach to combat money laundering
Adding certain types of vegetation at precise locations for their positive impact may be a cheaper, more flexible, and more environment-friendly way to strengthen dikes than the traditional increase in height. However, this nature-based (NB) option is not yet widely implemented due to the lack of precise knowledge of the potential of vegetation effects and their uncertainty.
This study uses a probabilistic method to better understand the effects of vegetation by including vegetation in the computation of the failure probabilities of Dutch river dikes. A framework was established to combine all these vegetation effects simultaneously in the computation of the total failure probability, considering different magnitudes of each effect. This enables the consideration of a wide range of vegetation scenarios, from which conclusions were drawn.
Overall, this thesis provides a useful and versatile tool for assessing the influence of vegetation on dikes that has a lot of potential and can be easily enhanced in the future. ...
Adding certain types of vegetation at precise locations for their positive impact may be a cheaper, more flexible, and more environment-friendly way to strengthen dikes than the traditional increase in height. However, this nature-based (NB) option is not yet widely implemented due to the lack of precise knowledge of the potential of vegetation effects and their uncertainty.
This study uses a probabilistic method to better understand the effects of vegetation by including vegetation in the computation of the failure probabilities of Dutch river dikes. A framework was established to combine all these vegetation effects simultaneously in the computation of the total failure probability, considering different magnitudes of each effect. This enables the consideration of a wide range of vegetation scenarios, from which conclusions were drawn.
Overall, this thesis provides a useful and versatile tool for assessing the influence of vegetation on dikes that has a lot of potential and can be easily enhanced in the future.
The Lamperti Transform
Applications to Stochastic Local Volatility Models
Lasso can also be used to and an order of importance for the regression coefficients by varying the strength of the L1 penalty. Regression coefficients are set to zero one-by-one as the penalty increases. Relaxed Lasso uses this rank and refits the variables without regularisation.
In Bayesian statics, regularisation is added via the prior. The Horseshoe prior can adjust to the average sparsity in the model and the Horseshoe prior either shrinks a signal aggressively to zero, or leaves the signal almost unchanged. The posterior of the model is never truly sparse. Predictive Projection can induce sparsity by setting the Monte Carlo samples of the posterior to zero for certain variables. This is done in such a way that the Kullback-Leibler divergence between the full posterior and the projected sparser posterior is minimised. I investigate the behaviour of the variable selection methods. The main focus is on the predictive performance, the sparsity, the computation time and the reliability of the estimated performance for the selected models. I apply the methods to various types of simulated data to compare the variable selection methods. The simulated data consist of data with independent predictors, collinear predictors and non-normal predictors, among others. The simulations studies show that Lasso and Predictive Projection lead to models with the highest performance overall and the predictive performance is more stable over different realisation of the data. For the same performance the Predictive Projection produces models with less variables. This makes Predictive Projection the most attractive method. I also employ the techniques to FreddieMac data, which is a data set on single-family mortgages. The results are similar to the simulated data and Predictive Projection with the Horseshoe prior is the most attractive variable selection method. Both the simulation studies and the FreddieMac application imply that the estimated performance of the Predictive Projection and Lasso are better than those of Forward Selection and Relaxed Lasso. However, the behaviour of the estimated performance remain unclear to a certain degree. More simulations per data type and more data types are needed for more insight into the estimated performance. Additional resources are needed to achieve this. ...
Lasso can also be used to and an order of importance for the regression coefficients by varying the strength of the L1 penalty. Regression coefficients are set to zero one-by-one as the penalty increases. Relaxed Lasso uses this rank and refits the variables without regularisation.
In Bayesian statics, regularisation is added via the prior. The Horseshoe prior can adjust to the average sparsity in the model and the Horseshoe prior either shrinks a signal aggressively to zero, or leaves the signal almost unchanged. The posterior of the model is never truly sparse. Predictive Projection can induce sparsity by setting the Monte Carlo samples of the posterior to zero for certain variables. This is done in such a way that the Kullback-Leibler divergence between the full posterior and the projected sparser posterior is minimised. I investigate the behaviour of the variable selection methods. The main focus is on the predictive performance, the sparsity, the computation time and the reliability of the estimated performance for the selected models. I apply the methods to various types of simulated data to compare the variable selection methods. The simulated data consist of data with independent predictors, collinear predictors and non-normal predictors, among others. The simulations studies show that Lasso and Predictive Projection lead to models with the highest performance overall and the predictive performance is more stable over different realisation of the data. For the same performance the Predictive Projection produces models with less variables. This makes Predictive Projection the most attractive method. I also employ the techniques to FreddieMac data, which is a data set on single-family mortgages. The results are similar to the simulated data and Predictive Projection with the Horseshoe prior is the most attractive variable selection method. Both the simulation studies and the FreddieMac application imply that the estimated performance of the Predictive Projection and Lasso are better than those of Forward Selection and Relaxed Lasso. However, the behaviour of the estimated performance remain unclear to a certain degree. More simulations per data type and more data types are needed for more insight into the estimated performance. Additional resources are needed to achieve this.
Characterizations of Multivariate Tail Dependence
On theory and inference to assess extremal dependence structures