Benchmarking the Robustness of Neuro-Symbolic Learning against Backdoor Attacks
Semantic Loss vs BadNets Poisoning Attack
D. Becerra Merodio (TU Delft - Electrical Engineering, Mathematics and Computer Science)
A. Agiollo – Mentor (TU Delft - Cyber Security)
Kaitai Liang – Mentor (TU Delft - Cyber Security)
A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Neuro-Symbolic (NeSy) models combine the generalization ability of neural networks with the interpretability of symbolic reasoning. While the vulnerability of neural networks to backdoor data poisoning attacks is well-documented, their implications for NeSy models remain underexplored. This paper investigates whether adding a semantic loss component to a neural network improves its robustness against BadNets backdoor attacks. We evaluate multiple semantic loss models trained on the CelebA dataset with varying constraints, semantic loss weights, and backdoor trigger configurations. Our results show that incorporating a semantic loss model with constraints that involve the target label significantly reduces the attack success rate. Additionally, we found that increasing the weight of the semantic loss component can enhance robustness, although at the cost of balanced accuracy. Interestingly, changes in the size and placement of the trigger had minimal effect on attack performance. These findings suggest that while semantic loss can improve robustness to some extent, its effectiveness is highly dependent on the nature and relevance of the constraints used as well as on the weight assigned to the semantic loss component.