The Effect of Adversarial Attacks on Neuro-Symbolic Reasoning Shortcuts

None, None

The Effect of Adversarial Attacks on Neuro-Symbolic Reasoning Shortcuts

A Comparative Analysis for DeepProbLog

Bachelor Thesis (2025)

Author(s)

I.S.I. Schaaf (TU Delft - Technology, Policy and Management)

Contributor(s)

Kaitai Liang – Mentor (TU Delft - Cyber Security)

A. Agiollo – Mentor (TU Delft - Cyber Security)

A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)

Faculty

Electrical Engineering, Mathematics and Computer Science

BadNets Neuro-Symbolic Model Backdoor Attack Adversarial Attack Reasoning Shortcut DeepProbLog

To reference this document use:

https://resolver.tudelft.nl/uuid:03b2e914-d085-44a7-94ad-be4fd50547a9

More Info

expand_more

Publication Year

2025

Language

English

Graduation Date

23-06-2025

Awarding Institution

Delft University of Technology

Project

['CSE3000 Research Project']

Programme

['Computer Science and Engineering']

Faculty

Electrical Engineering, Mathematics and Computer Science

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Abstract

The growing reliance on Artificial Intelligence (AI) systems increases the need for their understandability and explainability. As a reaction, Neuro-Symbolic (NeSy) models have been introduced to separate neural classification from symbolic logic. Traditional deep learning models are known to be susceptible to data
poisoning adversarial attacks, such as data poisoning. However, the impact of these attacks on NeSy models remains under-explored. Most work on the subject records the attack effects by measuring Attack Success Rate (ASR) or Benign Accuracy (BA). Because of the separate neural and symbolic components within NeSy models, a backdoor attack can specifically target the models’ reasoning capabilities. The knowledge of how potential reasoning is affected by such a model after an attack is unavailable. This research delves into how BadNets backdoor attacks influence the reasoning of the DeepProbLog (DPL) Neuro-Symbolic (NeSy) framework.

This study employed a novel, generalisable benchmarking suite to quantify the upper bound of the Reasoning Shortcut Risk for various tasks. Experiments were conducted across multiple model instances to perform a comparative review of the Reasoning Shortcut Risk between these settings.

The findings reveal that BadNets attacks generally increase the upper bound of the Reasoning Shortcut Risk in DPL models. This means that the existence of this backdoor in such a model can be identified based on this metric. Additionally, it was discovered that even model hyperparameter tuning on the DPL model itself can increase the Reasoning Shortcut Risk. This suggests that optimisation for higher accuracies may inadvertently lead these models to exploit new reasoning shortcuts. No significant correlation was observed between the accuracy of the DPL model and its upper bound of Reasoning Shortcut Risk. The results indicate that default metrics fail to define whether a DPL model behaves as desired. DPL models can appear functionally correct while internally suffering from faulty reasoning.

This research found a higher upper bound for the Reasoning Shortcut Risk after a BadNets attack for tasks that rely more on the neural component of the DPL NeSy model. Furthermore, the research found that optimising poisoning parameters can influence the upper bound of the Reasoning Shortcut Risk. This highlights the importance of the threat model under analysis when researching reasoning in DPL NeSy models after applying a backdoor attack.

In conclusion, BadNets backdoor attacks fundamentally compromise the reasoning process in DPL NeSy models. This increase in Reasoning Shortcut Risk is often worsened by routine model optimisation. The research highlights the need for integrity metrics in addition to traditional performance indicators. These insights are vital for creating NeSy models that act according to why they are used, to be robust, trustworthy, and explainable.

Files

The-Effect-of-Adverserial-Atta... (pdf)

(pdf | 0.604 Mb)

License info not available