Detecting Textual Adversarial Examples through Randomized Substitution and Vote

Wang, Xiaosen; Xiong, Yifeng; He, Kun

Computer Science > Computation and Language

arXiv:2109.05698 (cs)

[Submitted on 13 Sep 2021 (v1), last revised 20 Jul 2022 (this version, v2)]

Title:Detecting Textual Adversarial Examples through Randomized Substitution and Vote

Authors:Xiaosen Wang, Yifeng Xiong, Kun He

View PDF

Abstract:A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, eg, adversarial training, input transformations, detection, etc. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS&V), which votes the prediction label by accumulating the logits of k samples generated by randomly substituting the words in the input text with synonyms. The proposed RS&V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that our RS&V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.

Comments:	Accepted by UAI 2022, code is avaliable at this https URL
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2109.05698 [cs.CL]
	(or arXiv:2109.05698v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2109.05698

Submission history

From: Xiaosen Wang [view email]
[v1] Mon, 13 Sep 2021 04:17:58 UTC (465 KB)
[v2] Wed, 20 Jul 2022 03:33:00 UTC (329 KB)

Computer Science > Computation and Language

Title:Detecting Textual Adversarial Examples through Randomized Substitution and Vote

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Detecting Textual Adversarial Examples through Randomized Substitution and Vote

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators