High-dimensional sparse vine copula regression with application to genomic prediction

None, None; None, None

High-dimensional sparse vine copula regression with application to genomic prediction

Journal Article (2023)

Author(s)

Ö. Şahin (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Claudia Czado

Research Group

Applied Probability

DOI related publication

https://doi.org/10.1093/biomtc/ujad042 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:75f658d0-b8e9-431e-b1ac-63cc6d83195b

More Info

expand_more

Publication Year

2023

Language

English

Research Group

Applied Probability

Journal title

Biometrics

Issue number

1

Volume number

80

Downloads counter

218

Collections

Institutional Repository

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

High-dimensional data sets are often available in genome-enabled predictions. Such data sets include nonlinear relationships with complex dependence structures. For such situations, vine copula-based (quantile) regression is an important tool. However, the current vine copula-based regression approaches do not scale up to high and ultra-high dimensions. To perform high-dimensional sparse vine copula-based regression, we propose 2 methods. First, we show their superiority regarding computational complexity over the existing methods. Second, we define relevant, irrelevant, and redundant explanatory variables for quantile regression. Then, we show our method's power in selecting relevant variables and prediction accuracy in high-dimensional sparse data sets via simulation studies. Next, we apply the proposed methods to the high-dimensional real data, aiming at the genomic prediction of maize traits. Some data processing and feature extraction steps for the real data are further discussed. Finally, we show the advantage of our methods over linear models and quantile regression forests in simulation studies and real data applications.

Files

Ujad042.pdf

(pdf | 0.895 Mb)