Robust Multi-label Active Learning for Missing Labels

None, None

Robust Multi-label Active Learning for Missing Labels

Bachelor Thesis (2021)

Author(s)

J. Rozen (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Lydia Chen – Mentor (TU Delft - Data-Intensive Systems)

T. Younesian – Graduation committee member (TU Delft - Data-Intensive Systems)

S. Ghiassi – Graduation committee member (TU Delft - Data-Intensive Systems)

F.A. Kuipers – Coach (TU Delft - Embedded Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

Copyright

Multi-label classification Active Learning Missing Labels

To reference this document use:

https://resolver.tudelft.nl/uuid:7e4457b6-3310-4c23-ad74-3cfa4be41579

More Info

expand_more

Publication Year

2021

Language

English

Copyright

Graduation Date

02-07-2021

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-label classification has gained a lot of attraction in the field of computer vision over the past couple of years. Here, each instance belongs to multiple class labels simultaneously. There are numerous methods for Multi-label classification, however all of them make the assumption that either the training images are completely labelled or that label correlations are given. Since Active Learning is frequently used when not much data is available, it could be used to determine the missing labels by querying an oracle. This paper proposes a novel solution that combines the current state-of-the-art for Multi-label classification with Active Learning to infer the missing labels. This is done with sampling strategies that try to select the most informative sample from the dataset by exploring the amount of missing labels. With these strategies, we try to minimize the relabeling cost for all samples, while maximizing the information gained. The chosen method called Hard sampling with entropy then looks to select those samples that both the model and we find informative. The chosen measure along with the other measure are then explored and evaluated on a subset of the MSCOCO dataset on 20%, 40% and 60% noise. Hard sampling with entropy then outperforms the state-of-the-art by more then 30%, as well as the baseline sampling method by 2% for 60% noise.

Files

Research_Project_2020_Template... (pdf)

(pdf | 2.34 Mb)

License info not available