Are BERT-based fact-checking models robust against adversarial attack?
More Info
expand_more
expand_more
Abstract
We seek to examine the vulnerability of BERT-based fact-checking. We implement a gradient based, adversarial attack strategy, based on Hotflip swapping individual tokens from the input. We use this on a pre-trained ExPred model for fact-checking. We find that gradient based adversarial attacks are ineffective against ExPred. Uncertainties about the similitude of the examples generated by our adversarial attack implementation cast doubts on the results.
Files
Main.pdf
(.pdf | 0.308 Mb)