Workshop on Insights from Negative Results in NLP

Anna Rogers, Joao Sedoc and Anna Rumshisky

Live Session 1: Nov 19, Live Session 1: Nov 19 (15:00-00:00 UTC)
Let's take a break from chasing leaderboards! Your negative results might save others time, or poke holes in things we take for granted.

Time (PDT) Event Hosts
Nov 19, (15:00-15:15 UTC)

Opening remarks

Anna Rogers
Nov 19, (15:15-16:00 UTC)

Invited talk: Rada Mihalcea: The ups and downs of word embeddings
Zoom, RocketChat
Word embeddings have largely been a “success story” in our field. They have enabled progress in numerous language processing applications, and have facilitated the application of large-scale language analyses in other domains, such as social sciences and humanities. While less talked about, word embeddings also have many shortcomings – instability, lack of transparency, biases, and more. In this talk, I will review the “ups” and “downs” of word embeddings, discuss tradeoffs, and chart potential future research directions to address some of the downsides of these word representations.

Anna Rumshisky
Nov 19, (16:00-16:15 UTC)

Q&A with Rada Mihalcea
Zoom, RocketChat

Anna Rumshisky
Nov 19, (16:15-16:45 UTC)

Thematic session: representation learning
Zoom, RocketChat
• Embedding Structured Dictionary Entries (Steven Wilson, Walid Magdy, Barbara McGillivray and Gareth Tyson) Paper
• Can Knowledge Graph Embeddings Tell Us What Fact-checked Claims Are About? (Valentina Beretta, Sébastien Harispe, Katarina Boland, Luke Lo Seen, Konstantin Todorov and Andon Tchechmedjiev) Paper
• Layout-Aware Text Representations Harm Clustering Documents by Type (Catherine Finegan-Dollak and Ashish Verma) Paper

Anna Rogers
Nov 19, (16:45-17:15 UTC)

Thematic session: dialogue
Zoom, RocketChat
• On Task-Level Dialogue Composition of Generative Transformer Model (Prasanna Parthasarathi, Sharan Narang and Arvind Neelakantan) Paper
• HINT3: Raising the bar for Intent Detection in the Wild (Gaurav Arora, Chirag Jain, Manas Chaturvedi and Krupal Modi) Paper
• Effects of Naturalistic Variation in Goal-Oriented Dialog (Jatin Ganhotra, Robert Moore, Sachindra Joshi and Kahini Wadhawan Paper

Joao Sedoc
Nov 19, (17:15-18:00 UTC)

Social break / meal time. Gather.town (room N)

n/a
Nov 19, (18:00-18:45 UTC)

Invited talk: Byron Wallace:Negative results yield interesting questions, or: a bunch of stuff that didn’t work
Zoom, RocketChat
I will discuss recent projects in which ideas did not pan out as expected, but where these initial negative results led to (arguably) more interesting questions. My hope is that these case studies of negative results — which ultimately led to work we viewed as compelling enough to warrant write-up — will foster discussion about when “negative” results are nonetheless interesting, and about the kinds of questions we ask in empirical NLP research.

Anna Rumshisky
Nov 19, (18:45-19:00 UTC)

Q&A with Byron Wallace
Zoom, RocketChat

Anna Rumshisky
Nov 19, (19:00-19:30 UTC)

Thematic session: question answering
Zoom, RocketChat
• Do Transformers Dream of Inference, or Can Pretrained Generative Models Learn Implicit Inferential Rules? (Zhengzhong Liang and Mihai Surdeanu) Paper
• What do we expect from Multiple-choice QA Systems? (Krunal Shah, Nitish Gupta and Dan Roth) Paper
• Q. Can Knowledge Graphs be used to Answer Boolean Questions? A. It’s complicated! (Daria Dzendzik, Carl Vogel and Jennifer Foster) Paper

Anna Rogers
Nov 19, (19:30-20:00 UTC)

Thematic session: natural language inference
Zoom, RocketChat
• Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented Data (William Huang, Haokun Liu and Samuel R. Bowman) Paper
• The Extraordinary Failure of Complement Coercion Crowdsourcing (Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg and Reut Tsarfat) Paper
• Poison Attacks against Text Datasets with Conditional Adversarially Regularized Autoencoder (Alvin Chan, Yi Tay, Yew-Soon Ong and Aston Zhang) Paper

Anna Rogers
Nov 19, (20:00-20:30 UTC)

Thematic session: lessons learned the hard way
Zoom, RocketChat
• Evaluating the Effectiveness of Efficient Neural Architecture Search for Sentence-Pair Tasks (Ansel MacLaughlin, Jwala Dhamala, Anoop Kumar, Sriram Venkatapathy, Ragav Venkatesan and Rahul Gupta) Paper
• NMF Ensembles? Not for Text Summarization! (Alka Khurana and Vasudha Bhatnagar) Paper
• If You Build Your Own NER Scorer, Non-replicable Results Will Come (Constantine Lignos and Marjan Kamyab) Paper

Joao Sedoc
Nov 19, (20:30-21:00 UTC)

Social break / meal time. Gather.town (room N)

n/a
Nov 19, (21:00-22:00 UTC)

Interactive Orals
• Domain adaptation challenges of BERT in tokenization and sub-word representations of Out-of-Vocabulary words (Anmol Nayak, Hariprasad Timmapathini, Karthikeyan Ponnalagu and Vijendran Gopalan Venkoparao) Paper
• How Far Can We Go with Data Selection? A Case Study on Semantic Sequence Tagging Tasks (Samuel Louvan and Bernardo Magnini) Paper
• Which Matters Most? Comparing the Impact of Concept and Document Relationships in Topic Models (Silvia Terragni, Debora Nozza, Elisabetta Fersini and Messina Enza) Paper
• How Effectively Can Machines Defend Against Machine-Generated Fake News? An Empirical Study (Meghana Moorthy Bhat and Srinivasan Parthasarathy) Paper
• Label Propagation-Based Semi-Supervised Learning for Hate Speech Classification (Ashwin Geet D’Sa, Irina Illina, Dominique Fohr and Dietrich Klakow) Paper
• An Analysis of Capsule Networks for Part of Speech Tagging in High- and Low-resource Scenarios (Andrew Zupon, Faiz Rafique and Mihai Surdeanu) Paper
• How Effective is Task-Agnostic Data Augmentation for Pretrained Transformers? (Shayne Longpre, Yu Wang and Christopher DuBois) Paper
• WER we are and WER we think we are (Piotr Szymański, Piotr Żelasko, Mikolaj Morzy, Adrian Szymczak, Marzena Żyła-Hoppe, Joanna Banaszczak, Lukasz Augustyniak, Jan Mizgajski and Yishay Carmiel) Paper

n/a
Nov 19, (22:00-22:45 UTC)

The frustrations of leaderboardism. Panel discussion with Kawin Ethayarajh, Jesse Dodge and Rachael Tatman
Zoom, RocketChat
Leaderboards do not only drive progress in NLP: the bias towards publication of positive, and particularly state-of-the-art results implicitly encourages the development of highly specialized and brittle systems. If the reported success cannot be reproduced, or does not generalize well, the main result is much frustration by the developers who pick up academic papers in search of something that would actually work.

Anna Rogers
Nov 19, (22:45-23:00 UTC)

QA with the panelists
Zoom, RocketChat

Anna Rogers
Nov 19, (23:00-23:15 UTC)

Closing remarks Zoom

Anna Rogers
Nov 19, (23:15-00:00 UTC)

Virtual happy hour Gather.town (room N)

Joao Sedoc

Pre-recorded Plenary Talks