Sparse Optimization for Unsupervised Extractive Summarization of Long Documents with the Frank-Wolfe Algorithm
Alicia Tsai, Laurent El Ghaoui
SustaiNLP: Workshop on Simple and Efficient Natural Language Processing Workshop Paper
You can open the pre-recorded video in a separate window.
Abstract:
We address the problem of unsupervised extractive document summarization, especially for long documents. We model the unsupervised problem as a sparse auto-regression one and approximate the resulting combinatorial problem via a convex, norm-constrained problem. We solve it using a dedicated Frank-Wolfe algorithm. To generate a summary with k sentences, the algorithm only needs to execute approximately k iterations, making it very efficient for a long document. We evaluate our approach against two other unsupervised methods using both lexical (standard) ROUGE scores, as well as semantic (embedding-based) ones. Our method achieves better results with both datasets and works especially well when combined with embeddings for highly paraphrased summaries.
NOTE: Video may display a random order of authors.
Correct author list is at the top of this page.