Enhancing Robustness of On-line Learning Models on Highly Noisy Data

None, None; None, None; None, None; None, None; None, None; None, None; None, None

Enhancing Robustness of On-line Learning Models on Highly Noisy Data

Journal Article (2021)

Author(s)

Zilong Zhao (Université Grenoble Alpes)

Robert Birke (ABB Research)

Rui Han (Beijing Institute of Technology)

Bogdan Robu (Université Grenoble Alpes)

Sara Bouchenak (INSA Lyon)

Sonia Ben Ben Mokhtar (INSA Lyon)

Y. Chen (TU Delft - Data-Intensive Systems)

Research Group

Data-Intensive Systems

DOI related publication

https://doi.org/10.1109/TDSC.2021.3063947

Machine Learning Anomaly Detection Attacks Failures Unreliable Data

To reference this document use:

https://resolver.tudelft.nl/uuid:d31b020a-dfb6-473d-9955-c61a8cd9a9f5

More Info

expand_more

Publication Year

2021

Language

English

Research Group

Data-Intensive Systems

Issue number

5

Volume number

18

Pages (from-to)

2177 - 2192

Abstract

Classification algorithms have been widely adopted to detect anomalies for various systems, e.g., IoT, cloud and face recognition, under the common assumption that the data source is clean, i.e., features and labels are correctly set. However, data collected from the wild can be unreliable due to careless annotations or malicious data transformation for incorrect anomaly detection. In this article, we extend a two-layer on-line data selection framework: Robust Anomaly Detector (RAD) with a newly designed ensemble prediction where both layers contribute to the final anomaly detection decision. To adapt to the on-line nature of anomaly detection, we consider additional features of conflicting opinions of classifiers, repetitive cleaning, and oracle knowledge. We on-line learn from incoming data streams and continuously cleanse the data, so as to adapt to the increasing learning capacity from the larger accumulated data set. Moreover, we explore the concept of oracle learning that provides additional information of true labels for difficult data points. We specifically focus on three use cases, (i) detecting 10 classes of IoT attacks, (ii) predicting 4 classes of task failures of big data jobs, and (iii) recognising 100 celebrities faces. Our evaluation results show that RAD can robustly improve the accuracy of anomaly detection, to reach up to 98.95 percent for IoT device attacks (i.e., +7%), up to 85.03 percent for cloud task failures (i.e., +14%) under 40 percent label noise, and for its extension, it can reach up to 77.51 percent for face recognition (i.e., +39%) under 30 percent label noise. The proposed RAD and its extensions are general and can be applied to different anomaly detection algorithms.

No files available

Metadata only record. There are no files for this record.