Multilevel Text Alignment with Cross-Document Attention

Xuhui Zhou, Nikolaos Pappas, Noah A. Smith

Machine Learning for NLP Long Paper

Gather-3C: Nov 17, Gather-3C: Nov 17 (18:00-20:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Text alignment finds application in tasks such as citation recommendation and plagiarism detection. Existing alignment methods operate at a single, predefined level and cannot learn to align texts at, for example, sentence \emph{and} document levels. We propose a new learning approach that equips previously established hierarchical attention encoders for representing documents with a cross-document attention component, enabling structural comparisons across different levels (document-to-document and sentence-to-document). Our component is weakly supervised from document pairs and can align at multiple levels. Our evaluation on predicting document-to-document relationships and sentence-to-document relationships on the tasks of citation recommendation and plagiarism detection shows that our approach outperforms previously established hierarchical, attention encoders based on recurrent and transformer contextualization that are unaware of structural correspondence between documents.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

Substance over Style: Document-Level Targeted Content Transfer
Allison Hegel, Sudha Rao, Asli Celikyilmaz, Bill Dolan,
Accurate Word Alignment Induction from Neural Machine Translation
Yun Chen, Yang Liu, Guanhua Chen, Xin Jiang, Qun Liu,
Text Segmentation by Cross Segment Attention
Michal Lukasik, Boris Dadachev, Kishore Papineni, Gonçalo Simões,