The Peaking Phenomenon in Semi-supervised Learning

Conference Paper (2016)
Author(s)

Jesse Krijthe (TU Delft - Pattern Recognition and Bioinformatics, Leiden University Medical Center)

Marco Loog (TU Delft - Pattern Recognition and Bioinformatics, University of Copenhagen)

Research Group
Pattern Recognition and Bioinformatics
DOI related publication
https://doi.org/10.1007/978-3-319-49055-7_27 Final published version
More Info
expand_more
Publication Year
2016
Language
English
Research Group
Pattern Recognition and Bioinformatics
Pages (from-to)
299-309
Publisher
Springer
ISBN (print)
978-3-319-49054-0
ISBN (electronic)
978-3-319-49055-7
Event
SSPR Joint IAPR International Workshops on Statistical Techniques in Pattern Recognition (SPR) and Structural and Syntactic Pattern Recognition (SSPR) (2016-11-29 - 2016-12-02), Mérida, Mexico
Downloads counter
23

Abstract

For the supervised least squares classifier, when the number of training objects is smaller than the dimensionality of the data, adding more data to the training set may first increase the error rate before decreasing it. This, possibly counterintuitive, phenomenon is known as peaking. In this work, we observe that a similar but more pronounced version of this phenomenon also occurs in the semi-supervised setting, where instead of labeled objects, unlabeled objects are added to the training set. We explain why the learning curve has a more steep incline and a more gradual decline in this setting through simulation studies and by applying an approximation of the learning curve based on the work by Raudys and Duin.