How Robust Is Neural-Symbolic Model Logic Tensor Networks Against Clean-Label Data Poisoning Backdoor Attacks?
Benchmarking Benign Accuracy and Attack Success Rate
A. Chiru (TU Delft - Electrical Engineering, Mathematics and Computer Science)
Kaitai Liang – Mentor (TU Delft - Cyber Security)
A. Agiollo – Mentor (TU Delft - Cyber Security)
A. Hanjalic – Graduation committee member (TU Delft - Intelligent Systems)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
Neuro-Symbolic (NeSy) models promise better interpretability and robustness than conventional neural networks, yet their resilience to data poisoning backdoors is largely untested. This work investigates that gap by attacking a Logic Tensor Network (LTN) with clean-label triggers. Two attack strategies are benchmarked on MNIST addition and modulo tasks: (i) a targeted Projected Gradient Descent (PGD) variant that minimises the loss towards a target class, and (ii) a weighted pixel-blending (naïve) method. Furthermore, three trigger placements suited to the task (left, right, or both images), poison rates (0.5%-20%), and blend ratios (10%-90%) are benchmarked while reporting benign accuracy and attack-success rate (ASR). Results show that PGD can reach ≈ 15% ASR on the harder modulo task when both images are poisoned, but has negligible impact on the simpler addition task. Additionally, the naïve attack never exceeds 5% ASR unless the blend is large enough to be recognisable during visual inspection. Increasing the poison rate beyond 10% does not increase attack success rate. Overall, clean-label backdoors remain low-yield against LTNs, but even a modest ASR is a concern for safety-critical deployments. Extending this work to include dirty-label poisoning reveals a sharp trade-off: ASR increases to ≈ 75% on the modulo task at the cost of reduced stealth, without benign accuracy being affected. Clean-label poisoning reduced addition task accuracy by roughly 35% while keeping ASR near 10%. Clean-label attacks remain low-yield yet stealthy, whereas dirty-label strategies achieve higher efficacy but expose the attack to detection through accuracy degradation. These findings highlight that even modest attack success rates pose risks in safety-critical settings. The findings demonstrate that backdoor potency and collateral effects are governed by task structure, underscoring the necessity of task-aware defence strategies.