A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization

Conference Paper (2020)
Author(s)

Alexander Mey (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Tom Julian Viering (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Marco Loog (University of Copenhagen, TU Delft - Electrical Engineering, Mathematics and Computer Science)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.1007/978-3-030-44584-3_26 Final published version
More Info
expand_more
Publication Year
2020
Language
English
Research Group
Pattern Recognition and Bioinformatics
Bibliographical Note
Virtual/online event due to COVID-19
Volume number
12080
Pages (from-to)
326-338
ISBN (print)
9783030445836
Event
18th International Conference on Intelligent Data Analysis, IDA 2020 (2020-04-27 - 2020-04-29), Konstanz, Germany
Downloads counter
295
Collections
Institutional Repository
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Manifold regularization is a commonly used technique in semi-supervised learning. It enforces the classification rule to be smooth with respect to the data-manifold. Here, we derive sample complexity bounds based on pseudo-dimension for models that add a convex data dependent regularization term to a supervised learning process, as is in particular done in Manifold regularization. We then compare the bound for those semi-supervised methods to purely supervised methods, and discuss a setting in which the semi-supervised method can only have a constant improvement, ignoring logarithmic terms. By viewing Manifold regularization as a kernel method we then derive Rademacher bounds which allow for a distribution dependent analysis. Finally we illustrate that these bounds may be useful for choosing an appropriate manifold regularization parameter in situations with very sparsely labeled data.