Evaluating and Characterizing Human Rationales

Samuel Carton, Anirudh Rathore, Chenhao Tan

Interpretability and Analysis of Models for NLP Long Paper

Zoom-16B: Nov 19, Zoom-16B: Nov 19 (00:00-01:00 UTC) [Join Zoom Meeting]

You can open the pre-recorded video in a separate window.

Abstract: Two main approaches for evaluating the quality of machine-generated rationales are: 1) using human rationales as a gold standard; and 2) automated metrics based on how rationales affect model behavior. An open question, however, is how human rationales fare with these automatic metrics. Analyzing a variety of datasets and models, we find that human rationales do not necessarily perform well on these metrics. To unpack this finding, we propose improved metrics to account for model-dependent baseline performance. We then propose two methods to further characterize rationale quality, one based on model retraining and one on using ``fidelity curves'' to reveal properties such as irrelevance and redundancy. Our work leads to actionable suggestions for evaluating and characterizing rationales.
NOTE: Video may display a random order of authors. Correct author list is at the top of this page.

Connected Papers in EMNLP2020

Similar Papers