Long-Short Term Masking Transformer: A Simple but Effective Baseline for Document-level Neural Machine Translation

Pei Zhang, Boxing Chen, Niyu Ge, Kai Fan

Machine Translation and Multilinguality Short Paper

Gather-1A: Nov 17, Gather-1A: Nov 17 (02:00-04:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Many document-level neural machine translation (NMT) systems have explored the utility of context-aware architecture, usually requiring an increasing number of parameters and computational complexity. However, few attention is paid to the baseline model. In this paper, we research extensively the pros and cons of the standard transformer in document-level translation, and find that the auto-regressive property can simultaneously bring both the advantage of the consistency and the disadvantage of error accumulation. Therefore, we propose a surprisingly simple long-short term masking self-attention on top of the standard transformer to both effectively capture the long-range dependence and reduce the propagation of errors. We examine our approach on the two publicly available document-level datasets. We can achieve a strong result in BLEU and capture discourse phenomena.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

Distilling Multiple Domains for Neural Machine Translation
Anna Currey, Prashant Mathur, Georgiana Dinu,
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie, Santiago Ontanon, Chris Alberti, Vaclav Cvicek, Zachary Fisher, Philip Pham, Anirudh Ravula, Sumit Sanghai, Qifan Wang, Li Yang,
COMET: A Neural Framework for MT Evaluation
Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie,
Uncertainty-Aware Semantic Augmentation for Neural Machine Translation
Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo,