Sampling settings in active learning for investigating inconsistency

None, None

Sampling settings in active learning for investigating inconsistency

Master Thesis (2020)

Author(s)

M. Li (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

M Loog – Mentor (TU Delft - Pattern Recognition and Bioinformatics)

Sicco Verwer – Graduation committee member (TU Delft - Cyber Security)

Jan van Van Gemert – Graduation committee member (TU Delft - Pattern Recognition and Bioinformatics)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Pattern Recognition Active learning Sampling bias

To reference this document use:

https://resolver.tudelft.nl/uuid:6c104c80-aef6-4657-b9f5-6683a0e45b14

More Info

expand_more

Publication Year

2020

Language

English

Copyright

Graduation Date

23-09-2020

Awarding Institution

Delft University of Technology

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Active learning has the potential to reduce labeling costs in terms of time and money. In practical use, active learning works as an efficient data labeling strategy. Another point of view to look at active learning is to consider active learning as a learning problem, where the training data is queried by the active learner. Under this perspective, an important question is inconsistency: can classifiers trained using active learning converge to the same result as using random sampling given an infinite number of data. In this paper, we discuss the possibility and potential consequences of using new sampling settings other than sampling without replacement in active learning to analyze the inconsistency problem. Moreover, a third sampling setting is defined to simulate the infinite data scenario in inconsistency. We compare the traditional setting, sampling without replacement in active learning with sampling with replacement in active learning, and true active learning. Furthermore, the two unusual sampling settings provide insight into the inconsistency problem. (1)Regularization parameter without adjustment can lead to inconsistency. (2)Querying data ”really” close to the decision boundary can also bring threats to active learning.

Files

Mengze_Li_final_thesis_paper.p... (pdf)

(pdf | 0.93 Mb)

License info not available