Ö. Şahin | TU Delft Repository

Sampling from Conditional Distributions of Simplified Vines

Journal article (2025) - Ariane Hanebeck, Özge Şahin, Petra Havlíčková, Claudia Czado

Simplified vine copulas are flexible tools over standard multivariate distributions for modeling and understanding different dependence properties in high-dimensional data. Their conditional distributions are of utmost importance, from statistical learning to graphical models. However, the conditional densities of vine copulas and, thus, vine distributions cannot be obtained in closed form without integration for all possible sets of conditioning variables. We propose a Markov chain Monte Carlo based approach of using Hamiltonian Monte Carlo to sample from any conditional distribution of arbitrarily specified simplified vine copulas and thus vine distributions. We show its accuracy through simulation studies and analyze data of multiple maize traits such as flowering times, plant height, and vigor. Use cases from predicting traits to estimating conditional Kendall’s tau are presented. ...

Vine Copula-Based Classifiers with Applications

Journal article (2024) - Özge Şahin, Harry Joe

The vine pair-copula construction can be used to fit flexible non-Gaussian multivariate distributions to a mix of continuous and discrete variables. With multiple classes, fitting univariate distributions and a vine to each class lead to posterior probabilities over classes that can be used for discriminant analysis. This is more flexible than methods with the Gaussian and/or independence assumptions, such as quadratic discriminant analysis and naive Bayes. Some variable selection methods are studied to accompany the vine copula-based classifier because unimportant variables can make discrimination worse. Simple numerical performance metrics cannot give a full picture of how well a classifier is doing. We introduce categorical prediction intervals and other summary measures to assess the difficulty of discriminating classes. Through extensive experiments on real data, we demonstrate the superior performance of our approaches compared to traditional discriminant analysis methods and random forests when features have different dependent structures for different classes. ...

High-dimensional sparse vine copula regression with application to genomic prediction

Journal article (2023) - Ö. Şahin, Claudia Czado

High-dimensional data sets are often available in genome-enabled predictions. Such data sets include nonlinear relationships with complex dependence structures. For such situations, vine copula-based (quantile) regression is an important tool. However, the current vine copula-based regression approaches do not scale up to high and ultra-high dimensions. To perform high-dimensional sparse vine copula-based regression, we propose 2 methods. First, we show their superiority regarding computational complexity over the existing methods. Second, we define relevant, irrelevant, and redundant explanatory variables for quantile regression. Then, we show our method's power in selecting relevant variables and prediction accuracy in high-dimensional sparse data sets via simulation studies. Next, we apply the proposed methods to the high-dimensional real data, aiming at the genomic prediction of maize traits. Some data processing and feature extraction steps for the real data are further discussed. Finally, we show the advantage of our methods over linear models and quantile regression forests in simulation studies and real data applications. ...

ESG, risk, and (tail) dependence

Journal article (2023) - K. Bax, Ö. Sahin, Claudia Czado, S. Paterlini

While environmental, social, and governance (ESG) trading activity has been a distinctive feature of financial markets, the debate if ESG scores can also convey information regarding a company's riskiness remains open. Regulatory authorities, such as the European Banking Authority (EBA), have acknowledged that ESG factors can contribute to risk. Therefore, it is important to model such risk dependencies and quantify what part of a company's riskiness can be attributed to the ESG scores. This paper aims to question whether ESG scores can be used to provide information on (tail) riskiness. By analyzing the (tail) dependence structure of companies with a range of ESG scores, that is within an ESG rating class, using high-dimensional vine copula modeling, we are able to show that risk can also depend on and be directly associated with a specific ESG rating class. Empirical findings on real-world data show positive not negligible ESG risks determined by ESG scores, especially during the 2008 crisis. ...

Environmental, Social, Governance scores and the Missing pillar—Why does missing information matter?

Journal article (2022) - Özge Sahin, Karoline Bax, Claudia Czado, Sandra Paterlini

Environmental, Social, and Governance (ESG) scores measure companies' performance concerning sustainability and are organized in three pillars: Environmental, Social, and Governance. These complementary non-financial ESG scores should provide information about companies' ESG performance and risks. However, the extent of not yet published ESG information makes the reliability of ESG scores questionable. To explicitly capture the not yet published information on ESG category scores, a new pillar, the so-called Missing (M) pillar, is proposed and added to the new definition of the Environmental, Social, Governance, and Missing (ESGM) scores. By relying on the data provided by Refinitiv, we show that the ESGM scores strengthen the companies' risk relationship. These new scores could benefit investors and practitioners as ESG exclusion strategies using only ESG scores might exclude assets with a low score solely because of their missing information and not necessarily because of a low ESG merit. ...

The pitfalls of (non-definitive) Environmental, Social, and Governance scoring methodology

Journal article (2022) - Özge Sahin, Karoline Bax, Sandra Paterlini, Claudia Czado

Evaluating companies' sustainability performance embraces environmental, social, and governance (ESG) activities. Data providers assign companies ESG scores as a quantitative measure based on available information. Refinitiv (previously ASSET4) is a key data provider whose scores are used extensively by researchers and companies; however, their ESG scoring methodology allows the scores from the five most recent years to change post-publication without any announcements. Such ESG scores are called non-definitive. Then, ESG research findings and companies' sustainability performance using the ESG data from the same data provider might be inconsistent. Optimization and exploratory data mining approaches show that it is possible to change ESG scores to exhibit stronger risk dependence. We discuss how the initial disclosure of ESG information and updating the published ESG information alter the way ESG scores are computed in a given industry group, impacting ESG research findings significantly. Moreover, the initial disclosure of ESG information and an update in the published ESG information might allow some companies to appear more sustainable, even though nothing has changed. Finally, our work indicates the criticality that should be addressed to improve comparability within research studies and companies' sustainability performance relying on data from the same ESG providers. ...

Vine copula mixture models and clustering for non-Gaussian data

Journal article (2022) - Özge Sahin, Claudia Czado

The majority of finite mixture models suffer from not allowing asymmetric tail dependencies within components and not capturing non-elliptical clusters in clustering applications. Since vine copulas are very flexible in capturing these dependencies, a novel vine copula mixture model for continuous data is proposed. The model selection and parameter estimation problems are discussed, and further, a new model-based clustering algorithm is formulated. The use of vine copulas in clustering allows for a range of shapes and dependency structures for the clusters. The simulation experiments illustrate a significant gain in clustering accuracy when notably asymmetric tail dependencies or/and non-Gaussian margins within the components exist. The analysis of real data sets accompanies the proposed method. The model-based clustering algorithm with vine copula mixture models outperforms others, especially for the non-Gaussian multivariate data. ...

Vine copula based dependence modeling in sustainable finance

Journal article (2022) - Claudia Czado, Karoline Bax, Özge Sahin, Thomas Nagler, Aleksey Min, Sandra Paterlini

Climate change and sustainability have become societal focal points in the last decade. Consequently, companies have been increasingly characterized by non-financial information, such as environmental, social, and governance (ESG) scores, based on which companies can be grouped into ESG classes. While many scholars have questioned the relationship between financial performance and risks of assets belonging to different ESG classes, the question about dependence among ESG classes is still open. Here, we focus on understanding the dependence structures of different ESG class indices and the market index through the lens of copula models. After a thorough introduction to vine copula models, we explain how cross-sectional and temporal dependencies can be captured by models based on vine copulas, more specifically, using ARMA-GARCH and stationary vine copula models. Using real-world ESG data over a long period with different economic states, we find that assets with medium ESG scores tend to show weaker dependence to the market, while assets with extremely high or low ESG scores tend to show stronger, non-Gaussian dependence. ...

Modeling the pharmacodynamics of nandrolone doping drug and implications for anti‐doping testing

Journal article (2020) - Özge Sahin, Feyyaz Senturk, Yaman Barlas, Hakan Yasarcan

We model the pathways of nandrolone in the body, an anabolic steroid widely used as a performance enhancing drug (PED). The model generates the dynamics of nandrolone and its metabolites. PED tests check for the presence of a primary metabolite of nandrolone, 19-NA in urine. To cheat in these tests, PED users typically use inhibitors that reduce the urinary concentration of 19-NA. One such inhibitor is finasteride. Finasteride’s main effect in the body is the inhibition of reductase enzymes that turn nandrolone into its metabolite 19-NA. To capture this effect, we include structures for finasteride and reductase enzymes in the model. The model is tested by fundamental structure validity tests. We also show that the model behavior is consistent with experimental data in the literature. We finally investigate the potential ways by which the drug users may cheat in PED tests and make suggestions for improved testing as counter-measures. ...