Robust (Deep) learning framework against dirty labels and beyond

None, None; None, None; None, None; None, None; None, None; None, None

Robust (Deep) learning framework against dirty labels and beyond

Conference Paper (2019)

Author(s)

Amirmasoud Ghiassi (Université Grenoble Alpes, TU Delft - Electrical Engineering, Mathematics and Computer Science)

Taraneh Younesian (TU Delft - Electrical Engineering, Mathematics and Computer Science, Université Grenoble Alpes)

Zhilong Zhao (ABB Future Labs)

Robert Birke (University of Neuchâtel)

Valerio Schiavoni (University of Neuchâtel)

Lydia Y. Chen (TU Delft - Electrical Engineering, Mathematics and Computer Science, Université Grenoble Alpes)

Research Group

Data-Intensive Systems

Deep neural networks Adversarial learning Active learning Data filtering Dirty labels Trusted execution

DOI related publication

https://doi.org/10.1109/TPS-ISA48467.2019.00038 Final published version

To reference this document use

https://resolver.tudelft.nl/uuid:21a69f0d-b26c-4202-bbeb-792809141f54

More Info

expand_more

Publication Year

2019

Language

English

Research Group

Data-Intensive Systems

Article number

9014352

Pages (from-to)

236-244

ISBN (electronic)

9781728167411

Event

1st IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications, TPS-ISA 2019 (2019-12-12 - 2019-12-14), Los Angeles, United States

Downloads counter

252

Abstract

Data is generated with unprecedented speed, due to the flourishing of social media and open platforms. However, due to the lack of scrutinizing, both clean and dirty data are widely spreaded. For instance, there is a significant portion of images tagged with corrupted dirty class labels. Such dirty data sets are not only detrimental to the learning outcomes, e.g., misclassified images into the wrong classes, but also costly. It is pointed out that bad data can cost the U.S. up to a daunting 3 trillion dollars per year. In this paper, we address the following question: how prevailing (deep) machine learning models can be robustly trained given a non-negligible presence of corrupted labeled data. Dirty labels significantly increase the complexity of existing learning problems, as the ground truth of label's quality are not easily assessed. Here, we advocate to rigorously incorporate human experts into one learning framework where both artificial and human intelligence collaborate. To such an end, we combine three strategies to enhance the robustness for deep and regular machine learning algorithms, namely, (i) data filtering through additional quality model, (ii) data selection via actively learning from expert, and (iii) imitating expert's correction process. We demonstrate three strategies sequentially with examples and apply them on widely used benchmarks, such as CIFAR10 and CIFAR100. Our initial results show the effectiveness of the proposed strategies in combating dirty labels, e.g., the resulting classification can be up to 50% higher than the state-of-the-art AI-only solutions. Finally, we extend the discussion of robust learning from the trusted data to the trusted execution environment.