Probabilistic partial least squares model

None, None; None, None; None, None; None, None; None, None

Probabilistic partial least squares model

Identifiability, estimation and application

Journal Article (2018)

Author(s)

Said el Bouhaddani (Leiden University Medical Center)

Hae Won Uh (Leiden University Medical Center, University Medical Center Utrecht)

Caroline Hayward (The University of Edinburgh)

Geurt Jongbloed (TU Delft - Delft Institute of Applied Mathematics)

Jeanine Houwing-Duistermaat (Leiden University Medical Center, University of Leeds)

Department

Delft Institute of Applied Mathematics

DOI related publication

https://doi.org/10.1016/j.jmva.2018.05.009

Inference Identifiability Dimension reduction EM algorithm Probabilistic partial least squares

To reference this document use:

https://resolver.tudelft.nl/uuid:eb1256ff-9878-4c0e-a94c-6da03f4bfed1

More Info

expand_more

Publication Year

2018

Language

English

Department

Delft Institute of Applied Mathematics

Bibliographical Note

Accepted Author Manuscript@en

Volume number

167

Pages (from-to)

331-346

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

With a rapid increase in volume and complexity of data sets, there is a need for methods that can extract useful information, for example the relationship between two data sets measured for the same persons. The Partial Least Squares (PLS) method can be used for this dimension reduction task. Within life sciences, results across studies are compared and combined. Therefore, parameters need to be identifiable, which is not the case for PLS. In addition, PLS is an algorithm, while epidemiological study designs are often outcome-dependent and methods to analyze such data require a probabilistic formulation. Moreover, a probabilistic model provides a statistical framework for inference. To address these issues, we develop Probabilistic PLS (PPLS). We derive maximum likelihood estimators that satisfy the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. We show that the PPLS parameters are identifiable up to sign. A simulation study is conducted to study the performance of PPLS compared to existing methods. The PPLS estimates performed well in various scenarios, even in high dimensions. Most notably, the estimates seem to be robust against departures from normality. To illustrate our method, we applied it to IgG glycan data from two cohorts. Our PPLS model provided insight as well as interpretable results across the two cohorts.

Files

45582177_PPLS_JMA_acc2.pdf

(pdf | 0.64 Mb)

- Embargo expired in 18-06-2019