DAGA: Data Augmentation with a Generation Approach forLow-resource Tagging Tasks

Bosheng Ding, Linlin Liu, Lidong Bing, Canasai Kruengkrai, Thien Hai Nguyen, Shafiq Joty, Luo Si, Chunyan Miao

Syntax: Tagging, Chunking, and Parsing Long Paper

Gather-4I: Nov 18, Gather-4I: Nov 18 (02:00-04:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Data augmentation techniques have been widely used to improve machine learning performance as they facilitate generalization. In this work, we propose a novel augmentation method to generate high quality synthetic data for low-resource tagging tasks with language models trained on the linearized labeled sentences. Our method is applicable to both supervised and semi-supervised settings. For the supervised settings, we conduct extensive experiments on named entity recognition (NER), part of speech (POS) tagging and end-to-end target based sentiment analysis (E2E-TBSA) tasks. For the semi-supervised settings, we evaluate our method on the NER task under the conditions of given unlabeled data only and unlabeled data plus a knowledge base. The results show that our method can consistently outperform the baselines, particularly when the given gold training data are less.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

Multi-Stage Pre-training for Low-Resource Domain Adaptation
Rong Zhang, Revanth Gangi Reddy, Md Arafat Sultan, Vittorio Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, Salim Roukos, Avi Sil, Todd Ward,
Cold-start Active Learning through Self-supervised Language Modeling
Michelle Yuan, Hsuan-Tien Lin, Jordan Boyd-Graber,
Augmented Natural Language for Generative Sequence Labeling
Ben Athiwaratkun, Cicero Nogueira dos Santos, Jason Krone, Bing Xiang,