Label-efficient model selection for pretrained classifiers

More Info
expand_more

Abstract

In recent advancements within the field of machine learning (ML), the automation of model development and deployment enabled the maintenance of high-quality models in production through continuous retraining, yielding a variety of models for the same problem settings. The fast moving progress of Large-Language-Models and other research areas let to a spiking interest in ML, expanding applications and the market for ML models. Researchers, enthusiasts, programming frameworks and major companies such as Amazon are actively developing and evaluating models for a wide variety of tasks, thus contributing to the growing number of available pretrained models. These models are accessible through various platforms, including Huggingface or the AWS Sagemaker program.
The most common cause for retraining of models in production is a distribution shift in the data. Consequently, continuous retraining on changing production data results in a wide variety of models addressing the same problem, each with different strengths. Each retraining iteration demands substantial amounts of labeled data and significant time investment. Given the increasing availability of pretrained models with distinct strengths and weaknesses, there is strong reason to believe that, instead of retraining, selecting the most suitable pretrained model can yield sufficient performance. The primary challenge in the context of model selection is the need for evidence to assess the quality of the models, and querying labels is the most effective method to address this. Despite the progress in ML research, the acquisition of labeled data remains a significant challenge. Labeling data is mostly achieved through human labor, which is a costly, time consuming, and error-prone process. Estimates based on AWS Mechnical Turk indicate that the expenses of labeling large datasets or complex tasks, such as image segmentation, can easily reach six figures or more. For instance, labeling the entire ImageNet dataset with ten workers costs approximately 190.000€. This necessitates minimizing the number of labels required as evidence for model selection. The existing literature on model selection primarily focuses on selection of a learning strategy in combination with its optimal hyperparameters to best fit the data. Although the objective of selecting the best model for the data remains the same, the setting in this thesis differs significantly. Specifically, the architecture and hyperparameters of the models are not relevant in this research, as they are predefined by the pretrained candidate models. Furthermore, the data used to assess the models is unlabeled, leading to a discussion on the suitability of alternative methods more closely aligned with this setting, such as active learning (AL), which focuses on intelligently labeling data points for training. Given the lack of appropriate methods for this problem, this thesis introduces the Model Picker algorithm as a solution for selecting pretrained classifiers. The algorithm aims to minimize labeling cost by employing a probabilistic model to adaptively sample the most valuable instances from the unlabeled data. The informativeness of a data point is estimated using Shannon’s mutual information between the latent variable, which represents the decision about the true best model, and the unknown labels of each data point, given the evidence sampled. The Model Picker algorithm was rigorously evaluated against fundamental baselines such as Query-by-Committee, Active Comparison of Prediction Models, GALAXY, and QDD which were adapted to the setting when necessary. This evaluation utilizes a wide variety of well-established datasets, including up to 114 of different models each for each dataset, such as ImageNet. The results demonstrate that Model Picker consistently outperforms all baselines by a significant margin, with peak performance that allows for accurate model selection with only 1/4 of the labels needed compared to the next best method. Additionally, this thesis investigates various methods for efficient optimization of Model Picker’s single hyperparameter, ϵ, including strategies such as sampling a small subset of data for labeling and generating a noisy oracle. The proposed framework is intended to serve as a fundamental stepping stone for future research in the domain of model selection. The thesis concludes with a discussion on the implications of the findings for the Model Picker algorithm, as well as for future research in the selection of pretrained classifiers. Furthermore, limitations of this study are discussed and potential future work to further enhance the Model Picker algorithm is
proposed.