Pre-Training Transformers as Energy-Based Cloze Models
Kevin Clark, Minh-Thang Luong, Quoc Le, Christopher D. Manning
Machine Learning for NLP Short Paper
You can open the pre-recorded video in a separate window.
Abstract:
We introduce Electric, an energy-based cloze model for representation learning over text. Like BERT, it is a conditional generative model of tokens given their contexts. However, Electric does not use masking or output a full distribution over tokens that could occur in a context. Instead, it assigns a scalar energy score to each input token indicating how likely it is given its context. We train Electric using an algorithm based on noise-contrastive estimation and elucidate how this learning objective is closely related to the recently proposed ELECTRA pre-training method. Electric performs well when transferred to downstream tasks and is particularly effective at producing likelihood scores for text: it re-ranks speech recognition n-best lists better than language models and much faster than masked language models. Furthermore, it offers a clearer and more principled view of what ELECTRA learns during pre-training.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.
Connected Papers in EMNLP2020
Similar Papers
PALM: Pre-training an Autoencoding&Autoregressive Language Model for Context-conditioned Generation
Bin Bi, Chenliang Li, Chen Wu, Ming Yan, Wei Wang, Songfang Huang, Fei Huang, Luo Si,

TNT: Text Normalization based Pre-training of Transformers for Content Moderation
Fei Tan, Yifan Hu, Changwei Hu, Keqian Li, Kevin Yen,

Neural Mask Generator: Learning to Generate Adaptive Word Maskings for Language Model Adaptation
Minki Kang, Moonsu Han, Sung Ju Hwang,

Multilingual Denoising Pre-training for Neural Machine Translation
Jiatao Gu, Yinhan Liu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer,
