Multi-AL: Robust Active learning for Multi-label Classifier

Bachelor Thesis (2021)
Author(s)

M.J. Basting (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

Lydia Chen – Graduation committee member (TU Delft - Data-Intensive Systems)

Taraneh Younesian – Mentor (TU Delft - Data-Intensive Systems)

Amirmasoud Ghiassi – Mentor (TU Delft - Data-Intensive Systems)

F.A. Kuipers – Graduation committee member (TU Delft - Embedded Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
Copyright
© 2021 Mark Basting
More Info
expand_more
Publication Year
2021
Language
English
Copyright
© 2021 Mark Basting
Graduation Date
02-07-2021
Awarding Institution
Delft University of Technology
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Multi-label learning is becoming more and moreimportant as real-world data often contains multi-ple labels. The dataset used for learning such aclassifier is of great importance. Acquiring a cor-rectly labelled dataset is however a difficult task.Active learning is a method which can, given anoisy dataset, identify important instances for anexpert to label. This greatly reduces the amountof instances needed to train an accurate classi-fier, and thus reduces the cost of cleaning a noisydataset. Therefore, this paper aims to present an ac-tive learning algorithm, focused on wrongly labeleddata, combined with a deep neural network formulti-label image classification. The proposed ac-tive learning solution is divided into two measures;a mislabelling likelihood and an informativenessmeasure together with an option to identify anduse highly probable clean instances in the dataset.Experiments performed on the real world dataset,called Microsoft COCO, with 20, 40 and 60% in-jected label noise show that Multi-AL outperformsthe current state-of-the-art multi-label learning al-gorithm called ASL by 28% while only using 600labelled instances in total and 250 extracted ’clean’instances. Multi-AL additionally outperforms ran-dom sampling by 3% on average for 20 and 40%random label noise when sampling from a wronglylabelled dataset of 23k instances.

Files

License info not available