Evaluating the Robustness of Neuro-Symbolic Networks Against Backdoor Threats with WaNet and Semantic Loss
F. Hamar (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Agiollo – Mentor (TU Delft - Cyber Security)
Kaitai Liang – Mentor (TU Delft - Cyber Security)
A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Backdoor attacks targeting Neural Networks face little to no resistance in achieving misclassifications thanks to an injected trigger. Neuro-symbolic architectures combine such networks with symbolic components to introduce semantic knowledge into purely connectionist designs. This paper aims to benchmark the robustness of such models against state-of-the-art backdoor attacks. In doing so it explores how semantic knowledge can be extracted from datasets and how various constraint sets fare against differing strength attacks. The paper concludes that building knowledge into the models can indeed induce robustness against adversarial poisoning attacks, but it also reflects on the conditions necessary for success.