A. Mey | TU Delft Repository

Improved Generalization in Semi-Supervised Learning

A Survey of Theoretical Results

Journal article (2022) - Alexander Mey, Marco Loog

Semi-supervised learning is the learning setting in which we have both labeled and unlabeled data at our disposal. This survey covers theoretical results for this setting and maps out the benefits of unlabeled data in classification and regression tasks. Most methods that use unlabeled data rely on certain assumptions about the data distribution. When those assumptions are not met, including unlabeled data may actually decrease performance. For all practical purposes, it is therefore instructive to have an understanding of the underlying theory and the possible learning behavior that comes with it. This survey gathers results about the possible gains one can achieve when using semi-supervised learning as well as results about the limits of such methods. Specifically, it aims to answer the following questions: what are, in terms of improving supervised methods, the limits of semi-supervised learning? What are the assumptions of different methods? What can we achieve if the assumptions are true? As, indeed, the precise assumptions made are of the essence, this is where the survey's particular attention goes out to. ...

Environment Shift Games: Are Multiple Agents the Solution, and not the Problem, to Non-Stationarity?

Conference paper (2021) - A. Mey, F.A. Oliehoek

Machine learning and artificial intelligence models that interact with and in an environment will unavoidably have impact on this environment and change it. This is often a problem as many methods do not anticipate such a change in the environment and thus may start acting sub-optimally. Although efforts are made to deal with this problem, we believe that a lot of potential is unused. Driven by the recent success of predictive machine learning, we believe that in many scenarios one can predict when and how a change in the environment will occur. In this paper we introduce a blueprint that intimately connects this idea to the multiagent setting, showing that the multiagent community has a pivotal role to play in addressing the challenging problem of changing environments. ...

Consistency and Finite Sample Behavior of Binary Class Probability Estimation

Conference paper (2021) - A. Mey, M. Loog

We investigate to which extent one can recover class probabilities within the empirical risk minimization (ERM) paradigm. We extend existing results and emphasize the tight relations between empirical risk minimization and class probability estimation. Following previous literature on excess risk bounds and proper scoring rules, we derive a class probability estimator based on empirical risk minimization. We then derive conditions under which this estimator will converge with high probability to the true class probabilities with respect to the L1-norm. One of our core contributions is a novel way to derive finite sample L1-convergence rates of this estimator for different surrogate loss functions. We also study in detail which commonly used loss functions are suitable for this estimation problem and briefly address the setting of model-misspecification. ...

Loss Bounds for Approximate Influence-Based Abstraction

Conference paper (2021) - E. Congeduti, A. Mey, F.A. Oliehoek

Sequential decision making techniques hold great promise to improve the performance of many real-world systems, but computational complexity hampers their principled application. Influencebased abstraction aims to gain leverage by modeling local subproblems together with the ‘influence’ that the rest of the system exerts on them. While computing exact representations of such influence might be intractable, learning approximate representations offers a promising approach to enable scalable solutions. This paper investigates the performance of such approaches from a theoretical perspective. The primary contribution is the derivation of sufficient conditions on approximate influence representations that can guarantee solutions with small value loss. In particular we show that neural networks trained with cross entropy are well suited to learn approximate influence representations. Moreover, we provide a sample based formulation of the bounds, which reduces the gap to applications. Finally, driven by our theoretical insights, we propose approximation error estimators, which empirically reveal to correlate well with the value loss. ...

A Distribution Dependent and Independent Complexity Analysis of Manifold Regularization

Conference paper (2020) - Alexander Mey, Tom Julian Viering, Marco Loog

Manifold regularization is a commonly used technique in semi-supervised learning. It enforces the classification rule to be smooth with respect to the data-manifold. Here, we derive sample complexity bounds based on pseudo-dimension for models that add a convex data dependent regularization term to a supervised learning process, as is in particular done in Manifold regularization. We then compare the bound for those semi-supervised methods to purely supervised methods, and discuss a setting in which the semi-supervised method can only have a constant improvement, ignoring logarithmic terms. By viewing Manifold regularization as a kernel method we then derive Rademacher bounds which allow for a distribution dependent analysis. Finally we illustrate that these bounds may be useful for choosing an appropriate manifold regularization parameter in situations with very sparsely labeled data. ...

Making Learners (More) Monotone

Conference paper (2020) - Tom Julian Viering, Alexander Mey, Marco Loog

Learning performance can show non-monotonic behavior. That is, more data does not necessarily lead to better models, even on average. We propose three algorithms that take a supervised learning model and make it perform more monotone. We prove consistency and monotonicity with high probability, and evaluate the algorithms on scenarios where non-monotone behaviour occurs. Our proposed algorithm MT_HT makes less than 1% non-monotone decisions on MNIST while staying competitive in terms of error rate compared to several baselines. Our code is available at https://github.com/tomviering/monotone. ...

Assumptions & Expectations in Semi-Supervised Machine Learning

Doctoral thesis (2020) - Alex Mey

The goal of this thesis is to investigate theoretical results in the field of semi-supervised learning, while also linking them to problems in related subjects as class probability estimation. ...

Semi-supervised learning, causality, and the conditional cluster assumption

Conference paper (2020) - Julius von Kügelgen, Alexander Mey, Marco Loog, Bernhard Schölkopf

While the success of semi-supervised learning (SSL) is still not fully understood, Schölkopf et al. (2012) have established a link to the principle of independent causal mechanisms. They conclude that SSL should be impossible when predicting a target variable from its causes, but possible when predicting it from its effects. Since both these cases are restrictive, we extend their work by considering classification using cause and effect features at the same time, such as predicting a disease from both risk factors and symptoms. While standard SSL exploits information contained in the marginal distribution of all inputs (to improve the estimate of the conditional distribution of the target given inputs), we argue that in our more general setting we should use information in the conditional distribution of effect features given causal features. We explore how this insight generalises the previous understanding, and how it relates to and can be exploited algorithmically for SSL. ...

Minimizers of the empirical risk and risk monotonicity

Conference paper (2019) - M. Loog, T.J. Viering, A. Mey

Plotting a learner’s average performance against the number of training samples results in a learning curve. Studying such curves on one or more data sets is a way to get to a better understanding of the generalization properties of this learner. The behavior of learning curves is, however, not very well understood and can display (for most researchers) quite unexpected behavior. Our work introduces the formal notion of risk monotonicity, which asks the risk to not deteriorate with increasing training set sizes in expectation over the training samples. We then present the surprising result that various standard learners, specifically those that minimize the empirical risk, can act nonmonotonically irrespective of the training sample size. We provide a theoretical underpinning for specific instantiations from classification, regression, and density estimation. Altogether, the proposed monotonicity notion opens up a whole new direction of research. ...

A soft-labeled self-training approach

Conference paper (2016) - Alexander Mey, Marco Loog

Semi-supervised classification methods try to improve a supervised learned classifier with the help of unlabeled data. In many cases one assumes a certain structure on the data, as for example the manifold assumption, the smoothness assumption or the cluster assumption. Self-training is a method that does not need any assumptions on the data itself. The idea is to use the supervised trained classifier to label the unlabeled points and to enlarge this way the training data. This paper aims to show that a self-training approach with soft-labeling is preferable in many cases in terms of expected loss (risk) minimization. The main idea is to use a soft-labeling to minimize the risk on labeled and unlabeled data together, in which the hard-labeled self-training is an extreme case. ...