TeaForN: Teacher-Forcing with N-grams
Sebastian Goodman, Nan Ding, Radu Soricut
Language Generation Long Paper
You can open the pre-recorded video in a separate window.
Abstract:
Sequence generation models trained with teacher-forcing suffer from issues related to exposure bias and lack of differentiability across timesteps. Our proposed method, Teacher-Forcing with N-grams (TeaForN), addresses both these problems directly, through the use of a stack of N decoders trained to decode along a secondary time axis that allows model-parameter updates based on N prediction steps. TeaForN can be used with a wide class of decoder architectures and requires minimal modifications from a standard teacher-forcing setup. Empirically, we show that TeaForN boosts generation quality on one Machine Translation benchmark, WMT 2014 English-French, and two News Summarization benchmarks, CNN/Dailymail and Gigaword.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.