Integrating omics datasets with the OmicsPLS package

Journal Article (2018)
Author(s)

S. el Bouhaddani (TU Delft - Statistics, Leiden University Medical Center)

H.-W. Uh (University Medical Center Utrecht)

G. Jongbloed (TU Delft - Delft Institute of Applied Mathematics)

Caroline Hayward (The University of Edinburgh)

Lucija Klarić (The University of Edinburgh, Genos Glycobiology Laboratory)

Szymon M. Kielbasa (Leiden University Medical Center)

Jeanine Houwing-Duistermaat (University of Leeds)

Research Group
Statistics
Copyright
© 2018 S. el Bouhaddani, Hae-Won Uh, G. Jongbloed, Caroline Hayward, Lucija Klarić, Szymon M. Kielbasa, Jeanine Houwing-Duistermaat
DOI related publication
https://doi.org/10.1186/s12859-018-2371-3
More Info
expand_more
Publication Year
2018
Language
English
Copyright
© 2018 S. el Bouhaddani, Hae-Won Uh, G. Jongbloed, Caroline Hayward, Lucija Klarić, Szymon M. Kielbasa, Jeanine Houwing-Duistermaat
Research Group
Statistics
Issue number
1
Volume number
19
Pages (from-to)
1-9
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Background: With the exponential growth in available biomedical data, there is a need for data integration methods that can extract information about relationships between the data sets. However, these data sets might have very different characteristics. For interpretable results, data-specific variation needs to be quantified. For this task, Two-way Orthogonal Partial Least Squares (O2PLS) has been proposed. To facilitate application and development of the methodology, free and open-source software is required. However, this is not the case with O2PLS. Results: We introduce OmicsPLS, an open-source implementation of the O2PLS method in R. It can handle both low- and high-dimensional datasets efficiently. Generic methods for inspecting and visualizing results are implemented. Both a standard and faster alternative cross-validation methods are available to determine the number of components. A simulation study shows good performance of OmicsPLS compared to alternatives, in terms of accuracy and CPU runtime. We demonstrate OmicsPLS by integrating genetic and glycomic data. Conclusions: We propose the OmicsPLS R package: a free and open-source implementation of O2PLS for statistical data integration. OmicsPLS is available at https://cran.r-project.org/package=OmicsPLSand can be installed in R via install.packages("OmicsPLS").