BAE: BERT-based Adversarial Examples for Text Classification

Siddhant Garg, Goutham Ramakrishnan

Machine Learning for NLP Short Paper

Gather-4B: Nov 18, Gather-4B: Nov 18 (02:00-04:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model. Recent works in NLP use rule-based synonym replacement strategies to generate adversarial examples. These strategies can lead to out-of-context and unnaturally complex token replacements, which are easily identifiable by humans. We present BAE, a black box attack for generating adversarial examples using contextual perturbations from a BERT masked language model. BAE replaces and inserts tokens in the original text by masking a portion of the text and leveraging the BERT-MLM to generate alternatives for the masked tokens. Through automatic and human evaluations, we show that BAE performs a stronger attack, in addition to generating adversarial examples with improved grammaticality and semantic coherence as compared to prior work.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack
Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li,
BERT-ATTACK: Adversarial Attack Against BERT Using BERT
Linyang Li, Ruotian Ma, Qipeng Guo, Xiangyang Xue, Xipeng Qiu,
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP
John Morris, Eli Lifland, Jin Yong Yoo, Jake Grigsby, Di Jin, Yanjun Qi,