Are BERT-based fact-checking models robust against adversarial attack?
E.E. Afriat (TU Delft - Electrical Engineering, Mathematics and Computer Science)
More Info
expand_more
Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.
Abstract
We seek to examine the vulnerability of BERT-based fact-checking. We implement a gradient based, adversarial attack strategy, based on Hotflip swapping individual tokens from the input. We use this on a pre-trained ExPred model for fact-checking. We find that gradient based adversarial attacks are ineffective against ExPred. Uncertainties about the similitude of the examples generated by our adversarial attack implementation cast doubts on the results.