More Bang for Your Buck: Natural Perturbation for Robust Question Answering

Daniel Khashabi, Tushar Khot, Ashish Sabharwal

Question Answering Short Paper

Zoom-1C: Nov 16, Zoom-1C: Nov 16 (16:00-17:00 UTC) [Join Zoom Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Deep learning models for linguistic tasks require large training datasets, which are expensive to create. As an alternative to the traditional approach of creating new instances by repeating the process of creating one instance, we propose doing so by first collecting a set of seed examples and then applying human-driven natural perturbations (as opposed to rule-based machine perturbations), which often change the gold label as well. Such perturbations have the advantage of being relatively easier (and hence cheaper) to create than writing out completely new examples. Further, they help address the issue that even models achieving human-level scores on NLP datasets are known to be considerably sensitive to small changes in input. To evaluate the idea, we consider a recent question-answering dataset (BOOLQ) and study our approach as a function of the perturbation cost ratio, the relative cost of perturbing an existing question vs. creating a new one from scratch. We find that when natural perturbations are moderately cheaper to create (cost ratio under 60%), it is more effective to use them for training BOOLQ models: such models exhibit 9% higher robustness and 4.5% stronger generalization, while retaining performance on the original BOOLQ dataset.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

Training Question Answering Models From Synthetic Data
Raul Puri, Ryan Spring, Mohammad Shoeybi, Mostofa Patwary, Bryan Catanzaro,
PathQG: Neural Question Generation from Facts
Siyuan Wang, Zhongyu Wei, Zhihao Fan, Zengfeng Huang, Weijian Sun, Qi Zhang, Xuanjing Huang,
Textual Data Augmentation for Efficient Active Learning on Tiny Datasets
Husam Quteineh, Spyridon Samothrakis, Richard Sutcliffe,
Unsupervised Adaptation of Question Answering Systems via Generative Self-training
Steven Rennie, Etienne Marcheret, Neil Mallinar, David Nahamoo, Vaibhava Goel,