Flexible co-data learning for high-dimensional prediction

None, None; None, None; None, None

Flexible co-data learning for high-dimensional prediction

Journal Article (2021)

Author(s)

Lodewyk F. Wessels (Oncode Institute, Nederlands Kanker Instituut - Antoni van Leeuwenhoek ziekenhuis, TU Delft - Pattern Recognition and Bioinformatics)

Mark A. van de Wiel (University of Cambridge, Amsterdam UMC)

Research Group

Pattern Recognition and Bioinformatics

Copyright

DOI related publication

https://doi.org/10.1002/sim.9162

Omics Empirical Bayes Clinical prediction Penalized generalized linear models Prior information

To reference this document use:

https://resolver.tudelft.nl/uuid:adf44a30-a225-4eb0-b46a-194716b2f494

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Research Group

Pattern Recognition and Bioinformatics

Issue number

26

Volume number

40

Pages (from-to)

5910-5925

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Clinical research often focuses on complex traits in which many variables play a role in mechanisms driving, or curing, diseases. Clinical prediction is hard when data is high-dimensional, but additional information, like domain knowledge and previously published studies, may be helpful to improve predictions. Such complementary data, or co-data, provide information on the covariates, such as genomic location or P-values from external studies. We use multiple and various co-data to define possibly overlapping or hierarchically structured groups of covariates. These are then used to estimate adaptive multi-group ridge penalties for generalized linear and Cox models. Available group adaptive methods primarily target for settings with few groups, and therefore likely overfit for non-informative, correlated or many groups, and do not account for known structure on group level. To handle these issues, our method combines empirical Bayes estimation of the hyperparameters with an extra level of flexible shrinkage. This renders a uniquely flexible framework as any type of shrinkage can be used on the group level. We describe various types of co-data and propose suitable forms of hypershrinkage. The method is very versatile, as it allows for integration and weighting of multiple co-data sets, inclusion of unpenalized covariates and posterior variable selection. For three cancer genomics applications we demonstrate improvements compared to other models in terms of performance, variable selection stability and validation.

Files

Sim.9162.pdf

(pdf | 1.33 Mb)