Multilingual Denoising Pre-training for Neural Machine Translation

Jiatao Gu; Yinhan Liu; Naman Goyal; Xian Li; Sergey Edunov; Marjan Ghazvininejad; Mike Lewis; Luke Zettlemoyer

Multilingual Denoising Pre-training for Neural Machine Translation

Jiatao Gu, Yinhan Liu, Naman Goyal, Xian Li, Sergey Edunov, Marjan Ghazvininejad, Mike Lewis, Luke Zettlemoyer

Abstract Connected Papers Add to Favorites

Machine Translation and Multilinguality Tacl Paper

Zoom-3C: Nov 17, Zoom-3C: Nov 17 (00:00-01:00 UTC) [Join Zoom Meeting]

You can open the pre-recorded video in a separate window.

Abstract: This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is the first method for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine-tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it enables transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.

NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020