Benchmarking the Robustness of Neuro-Symbolic Learning against Backdoor Attacks

Semantic Loss vs BadNets Poisoning Attack

Bachelor Thesis (2025)
Author(s)

D. Becerra Merodio (TU Delft - Electrical Engineering, Mathematics and Computer Science)

Contributor(s)

A. Agiollo – Mentor (TU Delft - Cyber Security)

Kaitai Liang – Mentor (TU Delft - Cyber Security)

A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty
Electrical Engineering, Mathematics and Computer Science
More Info
expand_more
Publication Year
2025
Language
English
Graduation Date
23-06-2025
Awarding Institution
Delft University of Technology
Project
['CSE3000 Research Project']
Programme
['Computer Science and Engineering']
Faculty
Electrical Engineering, Mathematics and Computer Science
Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

Neuro-Symbolic (NeSy) models combine the generalization ability of neural networks with the interpretability of symbolic reasoning. While the vulnerability of neural networks to backdoor data poisoning attacks is well-documented, their implications for NeSy models remain underexplored. This paper investigates whether adding a semantic loss component to a neural network improves its robustness against BadNets backdoor attacks. We evaluate multiple semantic loss models trained on the CelebA dataset with varying constraints, semantic loss weights, and backdoor trigger configurations. Our results show that incorporating a semantic loss model with constraints that involve the target label significantly reduces the attack success rate. Additionally, we found that increasing the weight of the semantic loss component can enhance robustness, although at the cost of balanced accuracy. Interestingly, changes in the size and placement of the trigger had minimal effect on attack performance. These findings suggest that while semantic loss can improve robustness to some extent, its effectiveness is highly dependent on the nature and relevance of the constraints used as well as on the weight assigned to the semantic loss component.

Files

License info not available