Exploiting Sentence Order in Document Alignment
Brian Thompson, Philipp Koehn
Machine Translation and Multilinguality Short Paper
You can open the pre-recorded video in a separate window.
Abstract:
We present a simple document alignment method that incorporates sentence order information in both candidate generation and candidate re-scoring. Our method results in 61% relative reduction in error compared to the best previously published result on the WMT16 document alignment shared task. Our method improves downstream MT performance on web-scraped Sinhala–English documents from ParaCrawl, outperforming the document alignment method used in the most recent ParaCrawl release. It also outperforms a comparable corpora method which uses the same multilingual embeddings, demonstrating that exploiting sentence order is beneficial even if the end goal is sentence-level bitext.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.
Connected Papers in EMNLP2020
Similar Papers
Improving Word Sense Disambiguation with Translations
Yixing Luan, Bradley Hauer, Lili Mou, Grzegorz Kondrak,
