Grammatical Error Correction in Low Error Density Domains: A New Benchmark and Analyses

Simon Flachs, Ophélie Lacroix, Helen Yannakoudakis, Marek Rei, Anders Søgaard

NLP Applications Long Paper

Gather-5C: Nov 18, Gather-5C: Nov 18 (18:00-20:00 UTC) [Join Gather Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Evaluation of grammatical error correction (GEC) systems has primarily focused on essays written by non-native learners of English, which however is only part of the full spectrum of GEC applications. We aim to broaden the target domain of GEC and release CWEB, a new benchmark for GEC consisting of website text generated by English speakers of varying levels of proficiency. Website data is a common and important domain that contains far fewer grammatical errors than learner essays, which we show presents a challenge to state-of-the-art GEC systems. We demonstrate that a factor behind this is the inability of systems to rely on a strong internal language model in low error density domains. We hope this work shall facilitate the development of open-domain GEC models that generalize to different topics and genres.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers

BLiMP: The Benchmark of Linguistic Minimal Pairs for English
Alex Warstadt, Alicia Parrish, Haokun Liu, Anhad Monananey, Wei Peng, Sheng-Fu Wang, Samuel Bowman,
Investigating representations of verb bias in neural language models
Robert Hawkins, Takateru Yamakoshi, Thomas Griffiths, Adele Goldberg,
Does the Objective Matter? Comparing Training Objectives for Pronoun Resolution
Yordan Yordanov, Oana-Maria Camburu, Vid Kocijan, Thomas Lukasiewicz,
Word Frequency Does Not Predict Grammatical Knowledge in Language Models
Charles Yu, Ryan Sie, Nicolas Tedeschi, Leon Bergen,