Sampling settings in active learning for investigating inconsistency
M. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)
M Loog – Mentor (TU Delft - Pattern Recognition and Bioinformatics)
Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)
Jan van Van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Active learning has the potential to reduce labeling costs in terms of time and money. In practical use, active learning works as an efficient data labeling strategy. Another point of view to look at active learning is to consider active learning as a learning problem, where the training data is queried by the active learner. Under this perspective, an important question is inconsistency: can classifiers trained using active learning converge to the same result as using random sampling given an infinite number of data. In this paper, we discuss the possibility and potential consequences of using new sampling settings other than sampling without replacement in active learning to analyze the inconsistency problem. Moreover, a third sampling setting is defined to simulate the infinite data scenario in inconsistency. We compare the traditional setting, sampling without replacement in active learning with sampling with replacement in active learning, and true active learning. Furthermore, the two unusual sampling settings provide insight into the inconsistency problem. (1)Regularization parameter without adjustment can lead to inconsistency. (2)Querying data ”really” close to the decision boundary can also bring threats to active learning.