Print Email Facebook Twitter Safe Semi-Supervised Learning Title Safe Semi-Supervised Learning Author Bertazzi, Andrea (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Loog, Marco (mentor) Bierkens, Joris (mentor) Degree granting institution Delft University of Technology Programme Applied Mathematics Date 2018-12-10 Abstract Semi-supervised algorithms have been shown to possibly have a worse performance than the corresponding supervised model. This may be due to a violation of the assumptions on the data that are introduced in most classification systems. We study an approach that was previously shown to have guarantees of improvement for the LDA classifier in terms of log-likelihood on the full data-set of labeled and unlabeled observations. This method is based on two key concepts: contrast and pessimism. We extend this approach to a broader class of probabilistic generative models, in which the class conditional distributions can be modeled with any parametric class that belongs to the exponential family. In this case, we prove that the classifier is never worse and, under mild assumptions, strictly improves the log-likelihood on the complete data-set. The case of Gaussian densities is analyzed in detail, both for LDA and QDA. Moreover, we study this method in the case of least squares classification. In terms of square loss we prove the contrastive pessimistic classifier is guaranteed not to degrade the performance and, with a further requirement on the data, it strictly outperforms the supervised model. Finally, we apply contrast and pessimism to the task of parameter estimation of a multivariate Gaussian density in a missing data framework. We fully characterize the case of a missing block of data and we show that a strictly increased likelihood on the complete data-set is obtained for any monotone sample. In other terms, the contrastive pessimistic estimates are guaranteed to fit better the complete data-set composed of both observed and hidden components. Subject semi-supervised learningcontrastpessimismparameter estimationmonotone samplemaximum likelihood estimation To reference this document use: http://resolver.tudelft.nl/uuid:5784060f-b67c-406c-ba30-20d456e503af Part of collection Student theses Document type master thesis Rights © 2018 Andrea Bertazzi Files PDF MSc_Thesis_Bertazzi.pdf 1.25 MB Close viewer /islandora/object/uuid%3A5784060f-b67c-406c-ba30-20d456e503af/datastream/OBJ/view