Are BERT-based fact-checking models robust against adversarial attack?

More Info
expand_more

Abstract

We seek to examine the vulnerability of BERT-based fact-checking. We implement a gradient based, adversarial attack strategy, based on Hotflip swapping individual tokens from the input. We use this on a pre-trained ExPred model for fact-checking. We find that gradient based adversarial attacks are ineffective against ExPred. Uncertainties about the similitude of the examples generated by our adversarial attack implementation cast doubts on the results.

Files

Main.pdf
(.pdf | 0.308 Mb)